Step 4 - Set the NLP optimizer

For better and optimized results from the LDA model, we need to set the optimizer that contains an algorithm for LDA, and performs the actual computation that stores the internal data structure (for example, graph or matrix) and other parameters for the algorithm.

Here we use the EMLDAOPtimizer optimizer. You can also use the OnlineLDAOptimizer() optimizer. The EMLDAOPtimizer stores a data + parameter graph, plus algorithm parameters. The underlying implementation uses EM.

First, let's instantiate the EMLDAOptimizer by adding (1.0 / actualCorpusSize) along with a very low learning rate (that is, 0.05) to MiniBatchFraction to converge the training on a tiny dataset like ours as follows:

val optimizer = params.algorithm.toLowerCase 
    match {
        case "em" => 
            new EMLDAOptimizer
// add (1.0 / actualCorpusSize) to MiniBatchFraction be more robust on tiny datasets.
        case "online" => 
            new OnlineLDAOptimizer().setMiniBatchFraction(0.05 + 1.0 / actualCorpusSize)
        case _ => 
            thrownew IllegalArgumentException("Only em, online are supported but got 
            ${params.algorithm}.")
    }

Now, set the optimizer using the setOptimizer() method from the LDA API as follows:

lda.setOptimizer(optimizer)
    .setK(params.k)
    .setMaxIterations(params.maxIterations)
    .setDocConcentration(params.docConcentration)
    .setTopicConcentration(params.topicConcentration)
    .setCheckpointInterval(params.checkpointInterval)

Table of Contents for Step 4 - Set the NLP optimizer

Create new playlist

Sign In

Sign Up

Table of Contents for
Step 4 - Set the NLP optimizer