Step 8 - Measuring the likelihood of two documents

Now to get some more statistics, such as maximum likelihood or log likelihood on the document, we can use the following code:

if (ldaModel.isInstanceOf[DistributedLDAModel]) {
    val distLDAModel = ldaModel.asInstanceOf[DistributedLDAModel]
    val avgLogLikelihood = distLDAModel.logLikelihood / actualCorpusSize.toDouble
    println("The average log likelihood of the training data: " +

avgLogLikelihood)
    println()
}

The preceding code calculates the average log likelihood of the LDA model as an instance of the distributed version of the LDA model:

The average log likelihood of the training data: -209692.79314860413

For more information on the likelihood measurement, interested readers should refer to https://en.wikipedia.org/wiki/Likelihood_function.

Now imagine that we've computed the preceding metric for document X and Y. Then we can answer the following question:

How similar are documents X and Y?

The thing is, we should try to get the lowest likelihood from all the training documents and use it as a threshold for the previous comparison. Finally, to answer the third and final question:

If I am interested in topic Z, which documents should I read first?

A minimal answer: taking a close look at the topic distributions and the relative term weights, we can decide which document we should read first.

Table of Contents for Step 8 - Measuring the likelihood of two documents

Create new playlist

Sign In

Sign Up

Table of Contents for
Step 8 - Measuring the likelihood of two documents