Chapter 7 - Learning Text Representations

In the continuous bag-of-words (CBOW) model, we try to predict the target word given the context word, and in the skip-gram model, we try to predict the context word given the target word.
The loss function of the CBOW model is given as follows:
When we have millions of words in the vocabulary, we need to perform numerous weight updates until we predict the correct target word. It is time-consuming and also not an efficient method. So, instead of doing this, we mark the correct target word as a positive class and sample a few words from the vocabulary and mark it as a negative class, and this is called negative sampling
PV-DM is similar to a continuous bag of words model, where we try to predict the target word given a context word. In PV-DM, along with word vectors, we introduce one more vector, called the paragraph vector. As the name suggests, the paragraph vector learns the vector representation of the whole paragraph and it captures the subject of the paragraph.
The role of an encoder is to map the sentence to a vector and the role of the decoder is to generate the surrounding sentences; that is the previous and following sentences.

In QuickThoughts is an interesting algorithm for learning the sentence embeddings. In quick-thoughts, we try to learn whether a given sentence is related to the candidate sentence. So, instead of using a decoder, we use a classifier to learn whether a given input sentence is related to the candidate sentence.

Table of Contents for Chapter 7 - Learning Text Representations