Negative sampling

Let's say we are building a CBOW model and we have a sentence Birds are flying in the sky. Let the context words be birds, are, in, and the and the target word be flying.

We need to update the weights of the network every time it predicts the incorrect target word. So, except for the word flying, if a different word is predicted as a target word, then we update the network.

But this is just a small set of vocabulary. Consider the case where we have millions of words in the vocabulary. In that case, we need to perform numerous weight updates until the network predict the correct target word. It is time-consuming and also not an efficient method. So, instead of doing this, we mark the correct target word as a positive class and sample a few words from the vocabulary and mark it as a negative class.

What we are essentially doing here is that we are converting our multinomial class problem to a binary classification problem (that is, instead of trying to predict the target word, the model classifies whether the given word is target word or not).

The probability that the word is chosen as a negative sample is given as:

Table of Contents for Negative sampling

Create new playlist

Sign In

Sign Up

Table of Contents for
Negative sampling