Visualizing the Embedding Space - Plotting the model on Tensorboard

But there is no benefit of visualization if you cannot make use of it in terms of understanding how and what the model has learned. To gain a better intuition on what the model has learned we will be using Tensorboard. 

 Tensorboard is a powerful tool which can be used to build various kinds of plots to monitor your models while training process, build deep learning architectures and also word embeddings. Let's build a tensorboard embedding projection and make use of it to do various kinds of analysis.

To build an embedding plot in Tensorboard we need to perform following steps:

  1. Collect the words and the respective tensors (300-D vectors) that we learned on previous steps.
  2. Create a variable in the graph which will hold the tensors.
  3. Initialize Projector 
  4. Include an appropriately named embedding layer 
  5. Store all the words with a .tsv formatted metadata file.  These file types are used by tensorboard to load and display words.
  6.  Link the .tsv metadata file to the projector object.
  7. Define a function which will store all the summary checkpoints  

This is the code to complete these 7 steps:

vocab_list = points.word.values.tolist()
embeddings = all_word_vectors_matrix


embedding_var = tf.Variable(all_word_vectors_matrix, dtype='float32', name='embedding')
projector_config = projector.ProjectorConfig()


embedding = projector_config.embeddings.add()
embedding.tensor_name = embedding_var.name

LOG_DIR='./'
metadata_file = os.path.join("sample.tsv")

with open(os.path.join(LOG_DIR, metadata_file), 'wt') as metadata:
metadata.writelines("%s " % w.encode('utf-8') for w in vocab_list)

embedding.metadata_path = os.path.join(os.getcwd(), metadata_file)

# Use the same LOG_DIR where you stored your checkpoint.
summary_writer = tf.summary.FileWriter(LOG_DIR)

# The next line writes a projector_config.pbtxt in the LOG_DIR. TensorBoard will
# read this file during startup.
projector.visualize_embeddings(summary_writer, projector_config)

saver = tf.train.Saver([embedding_var])

with tf.Session() as sess:
# Initialize the model
sess.run(tf.global_variables_initializer())

saver.save(sess, os.path.join(LOG_DIR, metadata_file+'.ckpt'))

Once the tensorboard preparation module is executed, the binaries, metadata, and checkpoints get stored in the disk (as shown in the following figure):

Figure 3.4 Outputs created by the Tensorboard

To visualize the tensorboard, execute the following command in command prompt:

tensorboard --logdir=/path/of/the/checkpoint/

Now in browser open http://localhost:6006/#projector, this is tensorboard with all the data points projected in 3d space. You can zoom in, zoom out, look for the specific word and also re-train the model using t-SNE and visualize the cluster formation of the dataset:

Figure 3.5 Tensorboard Embedding projection 
Data Visualization helps you tell your story!  TensorBoard is very cool!  Your business use case stakeholders love impressive dynamic data visualizations.  They help with your model intuition and generating new hypotheses to test.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset