Now that we know the essential concepts and theory behind what can be used to build a successful image caption generator, let's look at the major building blocks that we will need to follow a hands-on approach to solving this problem. Based on the major operations for image captioning, to build a model, we will need the following major components:
- Image feature extractor—DCNN model with transfer learning
- Text caption generator—sequence-based language model with LSTM
- Encoder-decoder model
Let's briefly talk about these three components before implementing them for our caption generation system.