Building a skip-gram model

For our first step, we will import the Python modules necessary for our example:

from tensorflow.contrib.tensorboard.plugins import projector 

import os
import numpy as np
import tensorflow as tf

The projector module from TensorFlow provides the necessary methods for us to add our word vectors for visualization on TensorBoard. Subsequently, we will create a dictionary with all of the model parameters that we will be using to train our Word2vec model:

# Parameters related to training the model

model_params = { 
    "vocab_size": 50000,   # Maximum number of words 
    "batch_size": 64,      # Batch size for every training step
    "embedding_size": 200, # Dimensions of the word embedding vectors
    "num_negatives": 64,   # Number of negative words to be sampled
    "learning_rate": 1.0,  # Learning rate for the training
    "num_train_steps": 500000, # Number of steps to train the model
}

We will define a Word2vecModel class, which we will use for our model definition, training, and visualization routines. The class, with its __init__ method, looks as follows:

class Word2vecModel:
    """
    Initialize variables for Word2vec model
    """

    def __init__(self, data_set, vocab_size, 
                    embed_size, batch_size, num_sampled, learning_rate):
        self.vocab_size = vocab_size
        self.embed_size = embed_size
        self.batch_size = batch_size
        self.num_sampled = num_sampled
        self.lr = learning_rate
        self.global_step = tf.get_variable('global_step',
                                            initializer=tf.constant(0), 
                                            trainable=False)
        self.skip_step = model_params["skip_step"]
        self.data_set = data_set

We will use the __init__ method for initializing Word2vec model parameters. As shown previously, we will initialize the model's learning rate, batch size, vocabulary size, and embedding vector size, using the initialization method. We will then import the data using a generator, for which we use TensorFlow's Dataset API, as follows:

data_set = tf.data.Dataset.from_generator(generator, 
                                         (tf.int32, tf.int32), 
                           (tf.TensorShape([model_params["batch_size"]]),
                            tf.TensorShape([model_params["batch_size"], 1])))

We use the Dataset API to produce samples from generator, and use the dataset's from_generator method to produce data whose elements are produced by generator. The generator argument should be a callable object that returns an object that supports the iter() protocol. This could be a generator function. The elements generated by generator must be compatible with the given output_types argument, and optionally, the output_shapes arguments. We can write a generator method as follows:

def generator():
    yield from batch_generator(model_params["vocab_size"], 
                                model_params["batch_size"], 
                                model_params["skip_window"], 
                                file_params["visualization_folder"])

We will define a method to import the data for which we created generator. We will use TensorFlow's name_scope, to ensure that the graphs are well-defined for defining Python operations:

with tf.name_scope('nce_loss'):
    # construct variables for NCE loss
    nce_weight = tf.get_variable('nce_weight', 
                            shape=[self.vocab_size, self.embed_size],
                            initializer=tf.truncated_normal_initializer(
                                 stddev=1.0 / (self.embed_size ** 0.5)))
    nce_bias = tf.get_variable('nce_bias',
                    initializer=tf.zeros([model_params["vocab_size"]]))

    # define loss function to be NCE loss function
    self.loss = tf.reduce_mean(tf.nn.nce_loss(weights=nce_weight, 
                                    biases=nce_bias, 
                                    labels=self.target_words,
                                    inputs=self.embedding, 
                                    num_sampled=self.num_sampled,
                                    num_classes=self.vocab_size), 
                                name='loss')

We will then create another name_scope, to initialize an embedding matrix and an embedding lookup that can retrieve the embedding for any given word in the dataset. We will then create another name_scope, for defining the loss function. We will use noise contrastive estimation (NCE) loss, which converts a multinomial classification problem, such as the one encountered for predicting the next word, to a problem of binary logistic regression.

For each training sample in the dataset, the enhanced classifier is fed a true pair (one that appears in the center and another that appears in the context of the center word) and k number of randomly chosen negative pairs (consisting of the center word and a randomly chosen vocabulary word that does not occur in the context of the chosen word). By learning to distinguish the true pairs from the negative ones, the classifier learns the word vectors. In effect, this loss ensures that, instead of predicting the next word, the optimized classifier instead predicts whether a pair of words is good or bad. As the next step, we will define the optimizer that will be used for training:

self.optimizer = 
  tf.train.GradientDescentOptimizer(self.lr).minimize(self.loss, 
                                         global_step=self.global_step)

GradientDescentOptimizer is an optimizer method that implements the gradient optimization algorithm, which allows for setting the learning rate parameter, as well as the step on which this optimization was performed.

Finally, we will create histogram and scalar summaries by using the loss values, so as to monitor the loss during training. We will merge all of the summaries, so that they can be displayed on TensorBoard:

with tf.name_scope('summaries'):
    tf.summary.scalar('loss', self.loss)

    tf.summary.histogram('histogram loss', self.loss)
    self.summary_op = tf.summary.merge_all()

In these steps, we train the neural network for train_steps and monitor the loss. In general, the objective of the training is to reduce this loss value. However, if the training data is low, training for a longer period of time and using a higher dimensional word representation can often lead to overfitting. Hence, we need to ensure that the model does not overfit to the training data, while also ensuring that it generalizes well.

In the next section, we will visualize the embeddings by projecting them to a lower dimension, on TensorBoard.

Table of Contents for Building a skip-gram model

Create new playlist

Sign In

Sign Up

Table of Contents for
Building a skip-gram model