Chapter 7. Probabilistic Mixture Models

We have seen an initial example of mixture models, namely the Gaussian mixture model, in which we had a finite number of Gaussians to represent a dataset. In this chapter, we will focus on more advanced examples of mixture models, going again from the Gaussian mixture model to the Latent Dirichlet Allocation. The reason for so many models is that we want to capture various aspects of the data that are not easily captured by a mixture of Gaussian.

In many cases, we will use the EM algorithm to find the parameters of the model from the data. Also, it appears that most of the mixture models can have intractable solutions and need solutions on approximate inferences.

The first type of model we will see is a mixture of simple distributions. The simple distribution can be a Gaussian, a Bernoulli, a Poisson, and so on. The principle is always the same but the applications are different. If Gaussian distributions are nice for capturing clouds of points, Bernoulli distributions can be efficient to analyze black and white images, for example, in handwritten recognition.

We will then relax one assumption of the mixture model and see a second type of model called mixture of experts, in which the chosen cluster is dependent on the data point. It can be seen as a first approach to probabilistic decision trees.

Finally, we will see a very powerful model called the Latent Dirichlet Allocation (LDA), in which we relax another assumption of the mixture models. In mixture models, a point is supposed to have been generated by one cluster. In the LDA, it can belong to several clusters at the same time. This model has been successfully used in text analysis, among other things. It belongs to a family of mixed memberships models.

We will review the following elements in this chapter:

  • Mixture models in general, with examples of several distributions
  • Mixture of experts, when we assume clusters are dependent on the data points
  • LDA when we assume a point belongs to several clusters

Mixture models

The mixture model is a model of a larger distribution family called latent variable models, in which some of the variables are not observed at all. The reason is usually to simplify the model by grouping all the variables into subgroups with a different meaning. Another reason is also to introduce a hidden process into the model, the real reason for the data generation process. In other words, we assume that we have a set of models and something hidden will select one of these models, and then generate a data point from the selected model.

When the data naturally exhibits clusters, it seems reasonable to say that each cluster is a small model.

The whole problem is then to find to what extent a submodel will participate in the data generation process and what the parameters for each sub model are. This is usually solved using the EM algorithm.

There are many ways to combine small models in order to make a bigger or more generic model. The approach generally used in mixture modeling is to give a proportion to each sub model, such that the sum of proportions is one. In other words, we build an additive model as follows:

Mixture models

In this, πk is the proportion of each sub model. And each sub model is captured by the probability distribution pk.

Of course, in this form, the sum of πk is 1. Also, the proportions can be considered as random variables and the model can be extended in a Bayesian way. The model is therefore called a mixture model and the probability distribution pk is called the base distribution.

There are, theoretically, no constraints on the form of the base distribution and, depending of the function, several types of model arise. In Machine Learning: A Probabilistic Perspective, the following taxonomy helps us to understand many popular models:

Name

Base distribution

Latent var. distribution

Notes

Mixture of Gaussian

Gaussian

Discrete

A Gaussian is chosen among K

Probabilistic PCA

Gaussian

Gaussian

 

Probabilistic ICA

Gaussian

Laplace

Used for sparse coding

Latent Dirichlet Allocation

Discrete

Dirichlet

Used for text analysis

These are just a few examples to show that many models are possible based on the same principle. However, it does not mean they are all easy to solve and, in many cases, advanced algorithms will be necessary.

For example, the mixture of Gaussian model is defined as follows: we consider that each sub model is a Gaussian distribution (base distribution) and the latent variable distribution is discrete. For each distribution, we have a mean and a variance.

Sampling from such a model could give the following data set, for example:

Mixture models

The base density is:

Mixture models

And the latent variable distribution is a categorical distribution Mixture models.

The model is therefore:

Mixture models

In the case of a multidimensional Gaussian, the variance σ2k will be replaced by the covariance matrix Σk.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset