Visualizing word embeddings in TensorBoard

In the previous section, we learned how to build word2vec model for generating word embeddings using gensim. Now, we will see how to visualize those embeddings using TensorBoard. Visualizing word embeddings help us to understand the projection space and also helps us to easily validate the embeddings. TensorBoard provides us a built-in visualizer called the embedding projector for interactively visualizing and analyzing the high-dimensional data like our word embeddings. We will learn how can we use the TensorBoard's projector for visualizing the word embeddings step by step.

Import the required libraries:

import warnings
warnings.filterwarnings(action='ignore')

import
tensorflow as tf
from tensorflow.contrib.tensorboard.plugins import projector tf.logging.set_verbosity(tf.logging.ERROR)

import numpy as np
import gensim
import os

Load the saved model:

file_name = "model/word2vec.model"
model = gensim.models.keyedvectors.KeyedVectors.load(file_name)

After loading the model, we will save the number of words in our model to the max_size variable:

max_size = len(model.wv.vocab)-1

We know that the dimension of word vectors will be . So, we initialize a matrix named w2v with the shape as our max_size, which is the vocabulary size, and the model's first layer size, which is the number of neurons in the hidden layer:

w2v = np.zeros((max_size,model.layer1_size))

Now, we create a new file called metadata.tsv, where we save all the words in our model and we store the embedding of each word in the w2v matrix:

if not os.path.exists('projections'):
os.makedirs('projections')

with open("projections/metadata.tsv", 'w+') as file_metadata:

for i, word in enumerate(model.wv.index2word[:max_size]):

#store the embeddings of the word
w2v[i] = model.wv[word]

#write the word to a file
file_metadata.write(word + ' ')

Next, we initialize the TensorFlow session:

sess = tf.InteractiveSession()

Initialize the TensorFlow variable called embedding that holds the word embeddings:

with tf.device("/cpu:0"):
    embedding = tf.Variable(w2v, trainable=False, name='embedding')

Initialize all the variables:

tf.global_variables_initializer().run()

Create an object to the saver class, which is actually used for saving and restoring variables to and from our checkpoints:

saver = tf.train.Saver()

Using FileWriter, we can save our summaries and events to our event file:

writer = tf.summary.FileWriter('projections', sess.graph)

Now, we initialize the projectors and add the embeddings:

config = projector.ProjectorConfig()
embed = config.embeddings.add()

Next, we specify our tensor_name as embedding and metadata_path to the metadata.tsv file, where we have the words:

embed.tensor_name = 'embedding'
embed.metadata_path = 'metadata.tsv'

And, finally, save the model:

projector.visualize_embeddings(writer, config)

saver.save(sess, 'projections/model.ckpt', global_step=max_size)

Now, open the terminal and type the following command to open the tensorboard:

tensorboard --logdir=projections --port=8000

Once the TensorBoard is opened, go to the PROJECTOR tab. We can see the output, as shown in the following screenshot. As you can notice, when we type the word delighted, we can see all the related words, such as pleasant, surprise, and many more similar words, adjacent to that:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset