How it works...

In step 2, word vectors from the trained model are saved to your local machine for further processing.

In step 3, we extracted data from all the unique word vectors by using WordVectorSerializer. Basically, this will load an in-memory VocabCache from the mentioned input words. But it doesn't load whole vocab/lookup tables into the memory, so it is capable of processing large vocabularies served over the network.

A VocabCache manages the storage of information required for the Word2Vec lookup table. We need to pass the labels to the t-SNE model, and labels are nothing but the words represented by word vectors.

In step 4, we created a list to add all unique words.

The BarnesHutTsne phrase is the DL4J implementation class for the dual-tree t-SNE model. The Barnes–Hut algorithm takes a dual-tree approximation strategy. It is recommended that you reduce the dimension by up to 50 using another method, such as principal component analysis (PCA) or similar.

In step 5, we used BarnesHutTsne to design a t-SNE model for the purpose. This model contained the following components:

theta(): This is the Barnes–Hut trade-off parameter.
useAdaGrad(): This is the legacy AdaGrad implementation for use in NLP applications.

Once the t-SNE model is designed, we can fit it with weights loaded from words. We can then save the feature plots to an Excel file, as demonstrated in step 6.

The feature coordinates will look like the following:

We can plot these coordinates using gnuplot or any other third-party libraries. DL4J also supports JFrame-based visualizations.

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...