Summary

This was definitely one of the toughest real-world problems we tackled in the entire book. It was a perfect combination of transfer learning and generative deep learning being applied on a combination of data from images and text that combine different domains around computer vision and NLP. We covered essential concepts around understanding image captioning, the major components needed to build a caption generator, and built our own model from scratch. We made effective use of transfer learning principles by leveraging pretrained computer vision models to extract the right features from images to be captioned and then coupled them with some sequential models, such as LSTMs, to generate captions. The efficient and effective evaluation of sequential models is tough and we leveraged the industry standard BLEU score metric for our purpose. We implemented a scoring function from scratch and evaluated our models on the test dataset.

Finally, we built a generic automated image-captioning system from scratch by using all our previously-built assets and components, and tested it on a wide variety of images from diverse domains. We hope this gives you a good introduction into the world of image-captioning, which is a beautiful combination of computer vision and NLP, and we definitely encourage you to build your own image-captioning system!

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary