Optimizing the ridge regression parameter

Once you start using ridge regression to make predictions or learn about relationships in the system you're modeling, you'll start thinking about the choice of alpha.

For example, using OLS regression might show some relationship between two variables; however, when regularized by some alpha, the relationship is no longer significant. This can be a matter of whether a decision needs to be taken.

Getting ready

This is the first recipe where we'll tune the parameters for a model. This is typically done by cross-validation. There will be recipes laying out a more general way to do this in later recipes, but here we'll walkthrough to be able to tune ridge regression.

If you remember, in ridge regression, the gamma parameter is typically represented as alpha in scikit-learn when calling RidgeRegression; so, the question that arises is what the best alpha is. Create a regression dataset, and then let's get started:

>>> from sklearn.datasets import make_regression
>>> reg_data, reg_target = make_regression(n_samples=100,
                           n_features=2, effective_rank=1, noise=10)

How to do it...

In the linear_models module, there is an object called RidgeCV, which stands for ridge cross-validation. This performs a cross-validation similar to leave-one-out cross-validation (LOOCV).

Under the hood, it's going to train the model for all samples except one. It'll then evaluate the error in predicting this one test case:

>>> from sklearn.linear_model import RidgeCV
>>> rcv = RidgeCV(alphas=np.array([.1, .2, .3, .4]))
>>> rcv.fit(reg_data, reg_target)
RidgeCV(alphas=array([ 0.1, 0.2, 0.3, 0.4]), cv=None, 
        fit_intercept=True, gcv_mode=None, loss_func=None, 
        normalize=False, score_func=None, scoring=None, 
        store_cv_values=False)

After we fit the regression, the alpha attribute will be the best alpha choice:

>>> rcv.alpha_
0.10000000000000001

In the previous example, it was the first choice. We might want to hone in on something around .1:

>>> rcv2 = RidgeCV(alphas=np.array([.08, .09, .1, .11, .12]))
>>> rcv2.fit(reg_data, reg_target)
RidgeCV(alphas=array([ 0.08,  0.09,  0.1 ,  0.11,  0.12]), cv=None, 
                     fit_intercept=True, gcv_mode=None, 
                     loss_func=None, normalize=False, 
                     score_func=None, scoring=None, 
                     store_cv_values=False)

>>> rcv2.alpha_
0.08

We can continue this hunt, but hopefully, the mechanics are clear.

How it works...

The mechanics might be clear, but we should talk a little more about the why and define what was meant by "best". At each step in the cross-validation process, the model scores an error against the test sample. By default, it's essentially a squared error. Check out the There's more... section for more details.

We can force the RidgeCV object to store the cross-validation values; this will let us visualize what it's doing:

>>> alphas_to_test = np.linspace(0.01, 1)
>>> rcv3 = RidgeCV(alphas=alphas_to_test, store_cv_values=True)
>>> rcv3.fit(reg_data, reg_target)

As you can see, we test a bunch of points (50 in total) between 0.01 and 1. Since we passed store_cv_values as true, we can access these values:

>>> rcv3.cv_values_.shape
(100, 50)

So, we had 100 values in the initial regression and tested 50 different alpha values. We now have access to the errors of all 50 values. So, we can now find the smallest mean error and choose it as alpha:

>>> smallest_idx = rcv3.cv_values_.mean(axis=0).argmin()
>>> alphas_to_test[smallest_idx]

The question that arises is "Does RidgeCV agree with our choice?" Use the following command to find out:

>>> rcv3.alpha_
0.01

Beautiful!

It's also worthwhile to visualize what's going on. In order to do that, we'll plot the mean for all 50 test alphas.

How it works...

There's more...

If we want to use our own scoring function, we can do that as well. Since we looked up MAD before, let's use it to score the differences. First, we need to define our loss function:

>>> def MAD(target, predictions):
       absolute_deviation = np.abs(target - predictions)
       return absolute_deviation.mean()

After we define the loss function, we can employ the make_scorer function in sklearn. This will take care of standardizing our function so that scikit's objects know how to use it. Also, because this is a loss function and not a score function, the lower the better, and thus the need to let sklearn to flip the sign to turn this from a maximization problem into a minimization problem:

>>> import sklearn
>>> MAD = sklearn.metrics.make_scorer(MAD, greater_is_better=False)
>>> rcv4 = RidgeCV(alphas=alphas_to_test, store_cv_values=True, 
                   scoring=MAD)
>>> rcv4.fit(reg_data, reg_target)
>>> smallest_idx = rcv4.cv_values_.mean(axis=0).argmin()
>>> alphas_to_test[smallest_idx]
0.2322
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset