Building the model

Next up, we are going to use the following structure to build the computational graph:

Figure 15.11: Model architecture

So, as mentioned previously, we are going to use an embedding layer that will try to learn a special real-valued representation for these words. Thus, the words will be fed as one-hot vectors. The idea is to train this network to build up the weight matrix.

So, let's start off by creating the input to our model:

train_graph = tf.Graph()

#defining the inputs placeholders of the model
with train_graph.as_default():
    inputs_values = tf.placeholder(tf.int32, [None], name='inputs_values')
    labels_values = tf.placeholder(tf.int32, [None, None], name='labels_values')

The weight or embedding matrix that we are trying to build will have the following shape:

num_words X num_hidden_neurons

Also, we don't have to implement the lookup function ourselves because it's already available in Tensorflow: tf.nn.embedding_lookup(). So, it will use the integer encoding of the words and locate their corresponding rows in the weight matrix.

The weight matrix will be randomly initialized from a uniform distribution:

num_vocab = len(integer_to_vocab)

num_embedding =  300
with train_graph.as_default():
    embedding_layer = tf.Variable(tf.random_uniform((num_vocab, num_embedding), -1, 1))
    
    # Next, we are going to use tf.nn.embedding_lookup function to get the output of the hidden layer
    embed_tensors = tf.nn.embedding_lookup(embedding_layer, inputs_values)

It's very inefficient to update all the embedding weights of the embedding layer at once. Instead of this, we will use the negative sampling technique which will only update the weight of the correct word with a small subset of the incorrect ones.

Also, we don't have to implement this function ourselves as it's already there in TensorFlow tf.nn.sampled_softmax_loss:

# Number of negative labels to sample
num_sampled = 100

with train_graph.as_default():
    # create softmax weights and biases
    softmax_weights = tf.Variable(tf.truncated_normal((num_vocab, num_embedding))) 
    softmax_biases = tf.Variable(tf.zeros(num_vocab), name="softmax_bias") 
    
    # Calculating the model loss using negative sampling
    model_loss = tf.nn.sampled_softmax_loss(
        weights=softmax_weights,
        biases=softmax_biases,
        labels=labels_values,
        inputs=embed_tensors,
        num_sampled=num_sampled,
        num_classes=num_vocab)
    
    model_cost = tf.reduce_mean(model_loss)
    model_optimizer = tf.train.AdamOptimizer().minimize(model_cost)

To validate our trained model, we are going to sample some frequent or common words and some uncommon words and try to print our their closest set of words based on the learned representation of the skip-gram architecture:

with train_graph.as_default():
  
    # set of random words for evaluating similarity on
    valid_num_words = 16 
    valid_window = 100
    
    # pick 8 samples from (0,100) and (1000,1100) each ranges. lower id implies more frequent 
    valid_samples = np.array(random.sample(range(valid_window), valid_num_words//2))
    valid_samples = np.append(valid_samples, 
                               random.sample(range(1000,1000+valid_window), valid_num_words//2))
    
    valid_dataset_samples = tf.constant(valid_samples, dtype=tf.int32)
    
    # Calculating the cosine distance
    norm = tf.sqrt(tf.reduce_sum(tf.square(embedding_layer), 1, keep_dims=True))
    normalized_embed = embedding_layer / norm
    valid_embedding = tf.nn.embedding_lookup(normalized_embed, valid_dataset_samples)
    cosine_similarity = tf.matmul(valid_embedding, tf.transpose(normalized_embed))

Now, we have all the bits and pieces for our model and we are ready to kick off the training process.

Table of Contents for Building the model

Create new playlist

Sign In

Sign Up

Table of Contents for
Building the model