- Create a tokenizer factory and set the token preprocessor:
TokenizerFactory tokenFactory = new DefaultTokenizerFactory();
tokenFactory.setTokenPreProcessor(new CommonPreprocessor());
- Add the tokenizer factory to the Word2Vec model configuration:
Word2Vec model = new Word2Vec.Builder()
.minWordFrequency(wordFrequency)
.layerSize(numFeatures)
.seed(seed)
.epochs(numEpochs)
.windowSize(windowSize)
.iterate(iterator)
.tokenizerFactory(tokenFactory)
.build();
- Train the Word2Vec model:
model.fit();