- Add the following snippet (at the beginning of the source code) to set the data type for the current JVM runtime:
Nd4j.setDataType(DataBuffer.Type.DOUBLE);
- Write word vectors into a file:
WordVectorSerializer.writeWordVectors(model.lookupTable(),new File("words.txt"));
- Separate the weights of the unique words into their own list using WordVectorSerializer:
Pair<InMemoryLookupTable,VocabCache> vectors = WordVectorSerializer.loadTxt(new File("words.txt"));
VocabCache cache = vectors.getSecond();
INDArray weights = vectors.getFirst().getSyn0();
- Create a list to add all unique words:
List<String> cacheList = new ArrayList<>();
for(int i=0;i<cache.numWords();i++){
cacheList.add(cache.wordAtIndex(i));
}
- Build a dual-tree t-SNE model for dimensionality reduction using BarnesHutTsne:
BarnesHutTsne tsne = new BarnesHutTsne.Builder()
.setMaxIter(100)
.theta(0.5)
.normalize(false)
.learningRate(500)
.useAdaGrad(false)
.build();
- Establish the t-SNE values and save them to a file:
tsne.fit(weights);
tsne.saveAsFile(cacheList,"tsne-standard-coords.csv");