Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 7. Probabilistic Mixture Models

We have seen an initial example of mixture models, namely the Gaussian mixture model, in which we had a finite number of Gaussians to represent a dataset. In this chapter, we will focus on more advanced examples of mixture models, going again from the Gaussian mixture model to the Latent Dirichlet Allocation. The reason for so many models is that we want to capture various aspects of the data that are not easily captured by a mixture of Gaussian.

In many cases, we will use the EM algorithm to find the parameters of the model from the data. Also, it appears that most of the mixture models can have intractable solutions and need solutions on approximate inferences.

The first type of model we will see is a mixture of simple distributions. The simple distribution can be a Gaussian, a Bernoulli, a Poisson, and so on. The principle is always the same but the applications are different. If Gaussian distributions are nice for capturing clouds of points, Bernoulli distributions can be efficient to analyze black and white images, for example, in handwritten recognition.

We will then relax one assumption of the mixture model and see a second type of model called mixture of experts, in which the chosen cluster is dependent on the data point. It can be seen as a first approach to probabilistic decision trees.

Finally, we will see a very powerful model called the Latent Dirichlet Allocation (LDA), in which we relax another assumption of the mixture models. In mixture models, a point is supposed to have been generated by one cluster. In the LDA, it can belong to several clusters at the same time. This model has been successfully used in text analysis, among other things. It belongs to a family of mixed memberships models.

We will review the following elements in this chapter:

Mixture models in general, with examples of several distributions
Mixture of experts, when we assume clusters are dependent on the data points
LDA when we assume a point belongs to several clusters

Mixture models

The mixture model is a model of a larger distribution family called latent variable models, in which some of the variables are not observed at all. The reason is usually to simplify the model by grouping all the variables into subgroups with a different meaning. Another reason is also to introduce a hidden process into the model, the real reason for the data generation process. In other words, we assume that we have a set of models and something hidden will select one of these models, and then generate a data point from the selected model.

When the data naturally exhibits clusters, it seems reasonable to say that each cluster is a small model.

The whole problem is then to find to what extent a submodel will participate in the data generation process and what the parameters for each sub model are. This is usually solved using the EM algorithm.

There are many ways to combine small models in order to make a bigger or more generic model. The approach generally used in mixture modeling is to give a proportion to each sub model, such that the sum of proportions is one. In other words, we build an additive model as follows:

In this, π_k is the proportion of each sub model. And each sub model is captured by the probability distribution p_k.

Of course, in this form, the sum of π_k is 1. Also, the proportions can be considered as random variables and the model can be extended in a Bayesian way. The model is therefore called a mixture model and the probability distribution p_k is called the base distribution.

There are, theoretically, no constraints on the form of the base distribution and, depending of the function, several types of model arise. In Machine Learning: A Probabilistic Perspective, the following taxonomy helps us to understand many popular models:

Name	Base distribution	Latent var. distribution	Notes
Mixture of Gaussian	Gaussian	Discrete	A Gaussian is chosen among K
Probabilistic PCA	Gaussian	Gaussian
Probabilistic ICA	Gaussian	Laplace	Used for sparse coding
Latent Dirichlet Allocation	Discrete	Dirichlet	Used for text analysis

These are just a few examples to show that many models are possible based on the same principle. However, it does not mean they are all easy to solve and, in many cases, advanced algorithms will be necessary.

For example, the mixture of Gaussian model is defined as follows: we consider that each sub model is a Gaussian distribution (base distribution) and the latent variable distribution is discrete. For each distribution, we have a mean and a variance.

Sampling from such a model could give the following data set, for example: