Step 8 - Measuring the likelihood of two documents

Now to get some more statistics, such as maximum likelihood or log likelihood on the document, we can use the following code:

if (ldaModel.isInstanceOf[DistributedLDAModel]) {
val distLDAModel = ldaModel.asInstanceOf[DistributedLDAModel]
val avgLogLikelihood = distLDAModel.logLikelihood / actualCorpusSize.toDouble
println("The average log likelihood of the training data: " +
avgLogLikelihood)
println()
}

The preceding code calculates the average log likelihood of the LDA model as an instance of the distributed version of the LDA model:

The average log likelihood of the training data: -209692.79314860413
For more information on the likelihood measurement, interested readers should refer to https://en.wikipedia.org/wiki/Likelihood_function.

Now imagine that we've computed the preceding metric for document X and Y. Then we can answer the following question:

  • How similar are documents X and Y?

The thing is, we should try to get the lowest likelihood from all the training documents and use it as a threshold for the previous comparison. Finally, to answer the third and final question:

  • If I am interested in topic Z, which documents should I read first?

A minimal answer: taking a close look at the topic distributions and the relative term weights, we can decide which document we should read first.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset