Text caption generator – sequence-based language model with LSTM

A traditional sequence-based language model will predict the next probable word if it knows the previous words already occurring in the sequence. For our image captioning problem, as discussed in the previous section, based on the features from the DCNN model and the words already generated in the caption sequence, an LSTM model should be able to predict the next probable word in our caption at every time step.

An embedding layer is used to generate word embeddings for every unique word in our caption data dictionary or vocabulary, which is usually fed as an input to the LSTM model (part of our decoder) to generate the next probable word in our caption based on the image features and previous word sequence. The idea is to finally generate a sequence of words that together make the most sense in describing the input image.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset