Summary

This chapter started with how word embeddings encode semantics for individual tokens more effectively than the bag-of-words model that we used in Chapter 13, Working with Text Data. We also saw how to evaluated embedding by validating if semantic relationships among words are properly represented using linear vector arithmetic.

To learn word embeddings, we use shallow neural networks that used to be slow to train at the scale of web data containing billions of tokens. The word2vec model combines several algorithmic innovations to dramatically speed up training and has established a new standard for text feature generation. We saw how to use pretrained word vectors using spaCy and gensim, and learned to train our own word vector embeddings. We then applied a word2vec model to SEC filings. Finally, we covered the doc2vec extension that learns vector representations for documents in a similar fashion as word vectors and applied it to Yelp business reviews.

Now, we will begin part 4 on deep learning (available online as mentioned in the Preface), starting with an introduction to feed-forward networks, popular deep learning frameworks and techniques for efficient training at scale.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset