Summary

In this chapter, we discussed topic modeling. Topic modeling is more flexible than clustering as these methods allow each document to be partially present in more than one group. To explore these methods, we used a new package, gensim.

Topic modeling was first developed for and is easier to understand in the case of text, but in Chapter 12, Computer Vision, we will see how some of these techniques may be applied to images as well. Topic models are very important in modern computer vision research. In fact, unlike the previous chapters, this chapter was very close to the cutting edge of research in machine learning algorithms. The original LDA algorithm was published in a scientific journal in 2003, but the method that gensim uses to be able to handle Wikipedia was only developed in 2010 and the HDP algorithm is from 2011. The research continues, and you can find many variations and models with wonderful names such as the Indian buffet process (not to be confused with the Chinese restaurant process, which is a different model) or Pachinko allocation (Pachinko being a type of Japanese game, a cross between a slot machine and pinball).

We have now gone through some of the major machine learning modes: classification, clustering, and topic modeling.

In Chapter 11, Classification III - Music Genre Classification, we go back to classification, but this time we will be exploring advanced algorithms and approaches.

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary