Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

EM for mixture models

The standard way for fitting mixture models is the EM algorithm or Expectation Maximization. This algorithm was the focus of Chapter 3, Learning Parameters. So here, we just recall the basic principles of this algorithm again, to later show a Bernoulli mixture model.

A good package to use in R is mixtools to learn mixture models. A thorough presentation of this package is given in the Journal of Statistical Software, Oct 2009, Vol 32, Issue 6, mixtools: An R Package for Analyzing Finite Mixture Models.

The EM algorithm is a good choice for learning a mixture model. Indeed, in Chapter 3, Learning Parameters, we saw that when data is missing or even when variables are hidden (that is, all their respective data is missing), the EM algorithm will proceeds in two steps: first compute the expected value of the missing variables, so that to do as if the data is fully observed, and then maximize an objective function, usually the likelihood. Then, given the new set of parameters, the process is iterated again until some convergence criterion is reached.

In the mixture model represented by the preceding graphical model, it is clear that the variable z is the latent variable and the x_i is observed. We usually adopt plate notation to give a comprehensive view of the data generation process as a graphical model and use the following graph:

As in many probabilistic models, fitting the parameters can be solved by finding the set of parameters that maximizes the probability to generate the data. In other words, we aim at maximizing the log-likelihood of the model:

And that's where the problem is because the sum inside the log is hard to reduce and in many cases it is intractable. You will also note that this likelihood is written in terms of the observed data. So, if we follow the previous graphical model, we can write the complete log-likelihood by introducing the latent variables as follows:

The EM algorithm will solve the problem of computing this likelihood, in which z is completely hidden, by performing the following two steps.

First of all, we define the expected complete data log likelihood by:

This expectation is the expected complete data log likelihood computed from the parameters found in the previous step. Of course, at the beginning of the algorithm, the parameters are initialized to some arbitrary value. We saw in Chapter 3, Learning Parameters that this value can be anything, but choosing values at random can lead to a very long convergence time. Nevertheless, it has been proved that the EM algorithm converges for any value.

The E-step in the EM algorithm will compute this expected value—that is, the expected parameters given the data and the previous parameters.

Then the M-step will maximize this expectation given the newly found set of parameters θ that will solve the problem:

In the mixtools package, it is possible to fit a mixture of Gaussian using the function normalmixEM. Here is how to do so.

First, we generate a simple data set of two univariate Gaussian:

x1 ← rnorm(1000,-3,2)
x2 ← rnorm(850,3,1)

Then, we plot the result to see how they are empirically distributed using the hist function:

hist(c(x1,x2), breaks=100, col='red')

And we obtain the following figure, where we can easily identify the two clusters of data and see they are approximately distributed. Given that we use random generators, your result might be slightly different from what is shown in this book:

model ← normalmixEM( c(x1,x2) , lambda=.5, k=2)

This model should take between 30 to 40 iterations to run. We give an initial proportion of 50% to each class and set the number of clusters to 2.

In our results, we obtain the following parameters:

The mixing proportions are 54.9% and 45.1%. This precisely corresponds to the proportion we gave to our data sets x1 and x2.
The µ parameters are -2.85 and 3.01, which are extremely close to the initial values we gave too.

We can plot this histogram again with the Gaussian distributions superimposed on it:

hist(c(x1,x2),breaks=100,col='red',freq=F,ylim=c(0,0.4))
lines(x,dnorm(x,model$mu[1],model$sigma[1]), col='blue')
lines(x,dnorm(x,model$mu[2],model$sigma[2]), col='green')

The result is obviously far off what we expected in terms of proportion. If we redraw it by adding the proportions now, we obtain the expected mixture distribution:

hist(c(x1,x2),breaks=100,col='red',freq=F)
lines(x,
  model$lambda[1]*dnorm(x,model$mu[1],model$sigma[1]) + model$lambda[2]*dnorm(x,model$mu[2],model$sigma[2]) ,
 lwd=3)

The number of clusters is very important and can change the results dramatically when not chosen properly. For example, if we try the following values, we end up with different results:

model <- normalmixEM(c(x1,x2), lambda=.5, k=3)
number of iterations= 774
model <- normalmixEM(c(x1,x2), lambda=.5, k=4)
WARNING! NOT CONVERGENT!
number of iterations= 1000

With three clusters, the EM algorithm still converges. With four clusters, we need to increase the number of iterations for the algorithm to converge. In fact, even with three clusters, the result is interesting if not surprising. Plotting the density function, we obtain:

In this plot, we see that the second cluster is somehow attached to the first one. If you carefully look at the thick black line in the middle, you will see that the left-hand side distribution is not totally Gaussian. Inspecting the model parameters, we see that the means of the Gaussian are -3.74, -1.08, and 2.9. The middle one is indeed closer to the first cluster. The proportion is interesting: 38%, 15.4%, and 46.5%. So it seems the EM algorithm split the biggest cluster (1,000 points against 850) into 2 Gaussian to respect our choice of three clusters.

A slow convergence can sometimes be an indication that our hyperparameters are not totally adequate and more values should be explored.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for EM for mixture models

Create new playlist

Sign In

Sign Up

EM for mixture models

Table of Contents for
EM for mixture models