Automated image captioning in action!

Evaluating on our test dataset was a good way to test the model's performance, but how do we start using the model in the real world and caption completely new photos? This is where we need some knowledge of building an end-to-end system, which takes in any image as an input and gives us a free-text natural-language caption as the output.

Here are the major components and functions for our automated caption generator:

Caption model and metadata initializer
Image feature extraction model initializer
Transfer learning-based feature extractor
Caption generator

To make this generic, we built a class that makes use of several utility functions we mentioned in the previous sections:

from keras.preprocessing import image 
from keras.applications.vgg16 import preprocess_input as preprocess_vgg16_input 
from keras.applications import vgg16 
from keras.models import Model 
 
class CaptionGenerator: 
     
    def __init__(self, image_locations=[],  
                 word_to_index_map=None, index_to_word_map=None,  
                 max_caption_size=None, caption_model=None,  
                                                    beam_size=1): 
        self.image_locs = image_locations 
        self.captions = [] 
        self.image_feats = [] 
        self.word2index = word_to_index_map 
        self.index2word = index_to_word_map 
        self.max_caption_size = max_caption_size 
        self.vision_model = None 
        self.caption_model = caption_model 
        self.beam_size = beam_size 
     
    def process_image2arr(self, path, img_dims=(224, 224)): 
        img = image.load_img(path, target_size=img_dims) 
        img_arr = image.img_to_array(img) 
        img_arr = np.expand_dims(img_arr, axis=0) 
        img_arr = preprocess_vgg16_input(img_arr) 
        return img_arr 
     
    def initialize_model(self): 
         
        vgg_model = vgg16.VGG16(include_top=True, weights='imagenet',  
                                input_shape=(224, 224, 3)) 
        vgg_model.layers.pop() 
        output = vgg_model.layers[-1].output 
        vgg_model = Model(vgg_model.input, output) 
        vgg_model.trainable = False 
        self.vision_model = vgg_model 
         
    def process_images(self): 
        if self.image_locs: 
            image_feats = [self.vision_model.predict
                                         (self.process_image2arr
                                         (path=img_path)) for img_path   
                                            in self.image_locs] 
            image_feats = [np.reshape(img_feat, img_feat.shape[1]) for  
                                      img_feat in image_feats] 
            self.image_feats = image_feats 
        else: 
            print('No images specified') 
     
    def generate_captions(self): 
        captions = [generate_image_caption(model=self.caption_model, 
                    word_to_index_map=self.word2index,  
                    index_to_word_map=self.index2word,  
                    image_features=img_feat, 
                                           max_caption_size=self.max_caption_size, beam_size=self.beam_size)[0] 
                           for img_feat in self.image_feats] 
        self.captions = captions

Now that our caption generator has been implemented, it is time to put it into action! For the purpose of testing our caption generator, we have downloaded several images that are completely new and not present in the Flickr8K dataset. We downloaded specific images from Flickr that adhere to necessary commercial-usage-based licenses so that we can depict them in this book. We'll show some demonstrations in the next section.

Table of Contents for Automated image captioning in action!

Create new playlist

Sign In

Sign Up

Table of Contents for
Automated image captioning in action!