Using Gaussian processes for regression

In this recipe, we'll use the Gaussian process for regression. In the linear models section, we saw how representing prior information on the coefficients was possible using Bayesian Ridge Regression.

With a Gaussian process, it's about the variance and not the mean. However, with a Gaussian process, we assume the mean is 0, so it's the covariance function we'll need to specify.

The basic setup is similar to how a prior can be put on the coefficients in a typical regression problem. With a GP, a prior can be put on the functional form of the data, and it's the covariance between the data points that is used to model the data, and therefore, must be fit from the data.

Getting ready

So, let's use some regression data and walkthrough how Gaussian processes work in scikit-learn:

>>> from sklearn.datasets import load_boston
>>> boston = load_boston()

>>> boston_X = boston.data
>>> boston_y = boston.target

>>> train_set = np.random.choice([True, False], len(boston_y), 
                                 p=[.75, .25])

How to do it…

Now that we have the data, we'll create a scikit-learn GaussianProcess object. By default, it uses a constant regression function and squared exponential correlation, which is one of the more common choices:

>>> from sklearn.gaussian_process import GaussianProcess
>>> gp = GaussianProcess()
>>> gp.fit(boston_X[train_set], boston_y[train_set])
GaussianProcess(beta0=None, corr=<function squared_exponential 
                at 0x110809488>, normalize=True, 
                nugget=array(2.220446049250313e-15), 
                optimizer='fmin_cobyla', random_start=1, 
                random_state=<mtrand.RandomState object 
                at 0x10b9b58b8>, regr=<function constant 
                at 0x1108090c8>, storage_mode='full', 
                theta0=array([[ 0.1]]), thetaL=None, thetaU=None, 
                verbose=False)

That's a formidable object definition. The following are a couple of things to point out:

  • beta0: This is the regression weight. This defaults in a way such that MLE is used for estimation.
  • corr: This is the correlation function. There are several built-in correlation functions. We'll look at more of them in the following How it works... section.
  • regr: This is the constant regression function.
  • nugget: This is the regularization parameter. It defaults to a very small number. You can either pass one value to be used for each data point or a single value that needs to be applied uniformly.
  • normalize: This defaults to True, and it will center and scale the features. This would be scale is R.

Okay, so now that we fit the object, let's look at it's performance against the test object:

>>> test_preds = gp.predict(boston_X[~train_set])

Let's plot the predicted values versus the actual values; then, because we're doing regression, it's probably a good idea to look at plotted residuals and a histogram of the residuals:

>>> from matplotlib import pyplot as plt
>>> f, ax = plt.subplots(figsize=(10, 7), nrows=3)
>>> f.tight_layout()

>>> ax[0].plot(range(len(test_preds)), test_preds, 
               label='Predicted Values');
>>> ax[0].plot(range(len(test_preds)), boston_y[~train_set], 
               label='Actual Values');
>>> ax[0].set_title("Predicted vs Actuals")
>>> ax[0].legend(loc='best')

>>> ax[1].plot(range(len(test_preds)), 
               test_preds - boston_y[~train_set]);
>>> ax[1].set_title("Plotted Residuals")

>>> ax[2].hist(test_preds - boston_y[~train_set]);
>>> ax[2].set_title("Histogram of Residuals")

The output is as follows:

How to do it…

How it works…

Now that we've worked through a very quick example, let's look a little more at what some of the parameters do and how we can tune them based on the model we're trying to fit.

First, let's try to understand what's going on with the corr function. This function describes the relationship between the different pairs of X. The following five different correlation functions are offered by scikit-learn:

  • absolute_exponential
  • squared_exponential
  • generalized_exponential
  • cubic
  • linear

For example, the squared exponential has the following form:

How it works…

Linear, on the other hand, is just the dot product of the two points in question:

How it works…

Another parameter of interest is theta0. This represents the starting point in the estimation of the the parameters.

Once we have an estimation of K and the mean, the process is fully specified due to it being a Gaussian process; emphasis is put on Gaussian, a reason it's so popular for general machine learning work.

Let's use a different regr function, apply a different theta0, and look at how the predictions differ:

>>> gp = GaussianProcess(regr='linear', theta0=5e-1)
>>> gp.fit(boston_X[train_set], boston_y[train_set]);
>>> linear_preds = gp.predict(boston_X[~train_set])
>>> f, ax = plt.subplots(figsize=(7, 5))

Let's have a look at the fit:

>>> f.tight_layout()

>>> ax.hist(test_preds - boston_y[~train_set], 
            label='Residuals Original', color='b', alpha=.5);
>>> ax.hist(linear_preds - boston_y[~train_set], 
        label='Residuals Linear', color='r', alpha=.5);
>>> ax.set_title("Residuals")
>>> ax.legend(loc='best')

The following is the output:

How it works…

Clearly, the second model's predictions are slightly better for the most part. If we want to sum this up, we can look at the MSE of the predictions:

>>> np.power(test_preds - boston_y[~train_set], 2).mean()
26.254844099612455
>>> np.power(linear_preds - boston_y[~train_set], 2).mean()
21.938924337056068

There's more…

We might want to understand the uncertainty in our estimates. When we make the predictions, if we pass the eval_MSE argument as True, we'll get MSE and the predicted values. From a mechanics standpoint, a tuple of predictions and MSE is returned:

>>> test_preds, MSE = gp.predict(boston_X[~train_set], eval_MSE=True)
>>> MSE[:5]
array([ 11.95314572,   8.48397825,   6.0287539 ,  29.20844347, 
        0.36427829])

So, now that we have errors in the estimates (unfortunately), let's plot the first few to get an indication of accuracy:

>>> f, ax = plt.subplots(figsize=(7, 5))

>>> n = 20
>>> rng = range(n)
>>> ax.scatter(rng, test_preds[:n])
>>> ax.errorbar(rng, test_preds[:n], yerr=1.96*MSE[:n])

>>> ax.set_title("Predictions with Error Bars")

>>> ax.set_xlim((-1, 21));

The following is the output:

There's more…

As you can see, there's quite a bit of variance in the estimates for a lot of these points. On the other hand, our overall error wasn't too bad.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset