Using boosting to learn from errors

Gradient boosting regression is a technique that learns from its mistakes. Essentially, it tries to fit a bunch of weak learners. There are two things to note:

  • Individually, each learner has poor accuracy, but together they can have very good accuracy
  • They're applied sequentially, which means that each learner becomes an expert in the mistakes of the prior learner

Getting ready

Let's use some basic regression data and see how gradient boosting regression (henceforth, GBR) works:

>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(1000, 2, noise=10)

How to do it...

GBR is part of the ensemble module because it's an ensemble learner. This is the name for the idea behind using many weak learners to simulate a strong learner:

>>> from sklearn.ensemble import GradientBoostingRegressor as GBR
>>> gbr = GBR()
>>> gbr.fit(X, y)
>>> gbr_preds = gbr.predict(X)

Clearly, there's more to fitting a usable model, but this pattern should be pretty clear by now.

Now, let's fit a basic regression as well so that we can use it as the baseline:

>>> from sklearn.linear_model import LinearRegression
>>> lr = LinearRegression()
>>> lr.fit(X, y)
>>> lr_preds = lr.predict(X)

Now that we have a baseline, let's see how well GBR performed against linear regression.

I'll leave it as an exercise for you to plot the residuals, but to get started, do the following:

>>> gbr_residuals = y - gbr_preds
>>> lr_residuals = y - lr_preds

The following will be the output:

How to do it...

It looks like GBR has a better fit, but it's a bit hard to tell. Let's take the 95 percent CI and compare:

>>> np.percentile(gbr_residuals, [2.5, 97.5])
array([-16.05443674,  17.53946294])

>>> np.percentile(lr_residuals, [2.5, 97.5])
array([-20.05434912,  19.80272884])

So, GBR clearly fits a bit better; we can also make several modifications to the GBR algorithm, which might improve performance. I'll show an example here, then we'll walkthrough the different options in the How it works... section:

>>> n_estimators = np.arange(100, 1100, 350)
>>> gbrs = [GBR(n_estimators=n_estimator) for n_estimator in 
            n_estimators]
>>> residuals = {}
>>> for i, gbr in enumerate(gbrs):
       gbr.fit(X, y)
       residuals[gbr.n_estimators] = y - gbr.predict(X)

The following is the output:

How to do it...

It's a bit muddled, but hopefully, it's clear that as the number of estimators increases, the error goes down. Sadly, this isn't a panacea; first, we don't test against a holdout set, and second, as the number of estimators goes up, the training time takes longer. This isn't a big deal on the dataset we use here, but imagine one or two magnitudes higher.

How it works...

The first parameter, and the one we already looked at, is n_estimators—the number of weak learners that are used in GBR. In general, if you can get away with more (that is, have enough computational power), it is probably better. There are more nuances to the other parameters.

You should tune the max_depth parameter before all others. Since the individual learners are trees, max_depth controls how many nodes are produced for the trees. There's a subtle line between using the appropriate number of nodes that can fit the data well and using too many, which might cause overfitting.

The loss parameter controls the loss function, which determines the error. The ls parameter is the default, and stands for least squares. Least absolute deviation, Huber loss, and quantiles are also available.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset