So, last time when we created our word2vec model, we dumped that model into a binary file. Now its time to use that model as part of our CNN model. We perform this by initializing the weights W in the embeddings to these values.
Since we trained on a very small corpus in our previous word2vec model, let's choose the Word2Vec model that was pre-trained on the huge corpus. A good strategy is to use FastText embedding which is trained on online available documents and for 294 languages (https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md).
- We will download the English Embedding (https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.en.zip) .
- Extract the vocab and embedding vectors into the separate file.
- Load them into the train.py file.
That's it, by introducing this step we can now feed the embedding layer with the pretraining word2vec model. This incorporation of information has a sufficient amount of features to improve the learning process of the CNN model.