Understanding greedy and beam search

For generating predictions from our deep learning-based neural image captioning model, remember that it is not as straightforward as a basic classification or categorization model. We will need to generate a sequence of words from our model at each time-step based on the input image features. There are multiple ways of generating these sequence of words for the captions.

One approach is known as sampling, or greedy search, where we start with the <START> token, input image features, and then generate the first word based on p1 from the LSTM output. Then we feed in the corresponding predicted word embedding as input and generate the next word based on p2 from the next LSTM (in the unrolled form we talked about previously). This step continues until we reach the <END> token, which signifies the end of the caption, or we reach the maximum possible length of the token based on a predefined threshold.

The second approach is known as beam search, which is slightly more effective than a greedy-based search, where we select the most likely word at each step based on the highest probability considering the previously-generated words at each sequence, which is exactly what sampling does. Beam search expands upon the greedy search technique and always returns a list of the most likely output sequence of terms. Thus, as each sequence is constructed, to generate the next term at time step t + 1, instead of doing a greedy search and generating the most probable next term, it iteratively considers a set of k best sentences based on expanding to all possible next terms in the next time step. The value of k is usually a user-specified parameter that is used to control the total number of parallel, or beam, searches conducted to generate the caption sequences. Thus in beam search, we start with k most likely words as the first time step output in the caption sequence and keep generating the next sequence terms until one of them reach the end state. The full scope of covering detailed concepts around beam search would be out of the current scope so if you are interested, we recommend you check out any standard literature on beam search in the context of AI.

Table of Contents for Understanding greedy and beam search

Create new playlist

Sign In

Sign Up

Table of Contents for
Understanding greedy and beam search