Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Mixture of experts

The idea behind mixture of experts is to use a set of linear regressions for each sub space of the original data space and combine them with weighting functions that will successively give weight to each linear regression.

Consider the following example dataset, which we generate with the following toy code:

x1=runif(40,0,10)
x2=runif(40,10,20)

e1 = rnorm(20,0,2)
e2 = rnorm(20,0,3)
 
y1 = 1+2.5*x1 + e1
y2 = 35+-1.5*x2 + e2
 
xx=c(x1,x2)
yy=c(y1,y2)

Plotting the result, and doing a simple linear regression on it, gives the following:

Obviously, the linear regression does not capture the behavior of the data at all. It barely captures a general trend in the data that more or less averages the data set.

The idea of mixture of experts is to have several sub models within a bigger model—for example, having several regression lines, as the following graph:

In this graph, the red and green lines seems to better represent the data set. However, the model needs to choose when to choose each one. Again, a mixture model could be a solution, except that, in this case, we want the mixture to be dependent on the data points. So the graphical model will be a bit different:

This is the linear model as we know it. Next we introduce the dependence of the latent variable to the data points with:

Here, S(.) is, for example, a sigmoid function. The function p(z_i | x_iθ) is usually called the gating function.

The graphical model associated with such a model is quite different now because it introduces a dependency between the latent variable and the observations:

In general, mixture of experts models uses a softmax gating function such that:

The EM algorithm is usually a good algorithm to fit such a model. For example, the mixtools package includes a function hmeME to fit mixture of experts models. At the time of writing, this function is limited to two clusters.

The combination of all the gating functions requires us to sum to one at each point; for example, in our example we could use two sigmoids with the following effect:

And such a combination could give a final model that better interprets the initial data set, such as this graph:

We recommend the reader develop his or her own EM algorithm to fit such models and try different types of gating functions.

Techniques such as shrinkage or using a Bayesian approach on the parameters could be useful to avoid over-fitting too, which can be problematic when the number of sub models grows quickly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Mixture of experts

Create new playlist

Sign In

Sign Up

Mixture of experts

Table of Contents for
Mixture of experts