Paragraph Vector – Distributed Memory model

PV-DM is similar to the CBOW model, where we try to predict the target word given a context word. In PV-DM, along with word vectors, we introduce one more vector, called the paragraph vector. As the name suggests, the paragraph vector learns the vector representation of the whole paragraph and it captures the subject of the paragraph.

As shown in the following figure, each paragraph is mapped to a unique vector and each word is also mapped to a unique vector. So, in order to predict the target word, we combine the word vectors and paragraph vector by either concatenating or averaging them:

But having said all that, how is the paragraph vector useful in predicting the target word? What is really the use of having the paragraph vector? We know that we try to predict the target word based on the context words. Context words are of a fixed length and they are sampled within a sliding window from a paragraph.

Along with context words, we also make use of the paragraph vector for predicting the target word. Since the paragraph vector contains information about the subject of the paragraph, they contain meanings that the context words do not hold. That is, context word contains information about the particular words alone but the paragraph vector contains the information about the whole paragraph. So, we can think of the paragraph vector as a new word that is used along with context words for predicting the target word.

Paragraph vector is the same for all the context words sampled from the same paragraph and are not shared across paragraphs. Let's say that we have three paragraphs, p1, p2, and p3. If the context is sampled from a paragraph p1, then the p1 vector is used to predict the target word. If a context is sampled from paragraph p2, then the p2 vector is used. Thus, Paragraph vectors are not shared across paragraphs. However, word vectors are shared across all paragraphs. That is, the vector for the word sun is the same across all the paragraphs. We call our model as a distributed memory model of paragraph vectors, as our paragraph vectors serve as a memory that holds information that is missing from the current context words.

So, both of the paragraph vectors and word vectors are learned using stochastic gradient descent. On each iteration, we sample context words from a random paragraph, try to predict the target word, calculate the error, and update the parameters. After training, the paragraph vectors capture the embeddings of the paragraphs (documents).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset