Automatic cross validation

We've looked at the using cross validation iterators that scikit-learn comes with, but we can also use a helper function to perform cross validation for use automatically. This is similar to how other objects in scikit-learn are wrapped by helper functions, pipeline for instance.

Getting ready

First, we'll need to create a sample classifier; this can really be anything, a decision tree, a random forest, whatever. For us, it'll be a random forest. We'll then create a dataset and use the cross validation functions.

How to do it...

First import the ensemble module and we'll get started:

>>> from sklearn import ensemble
>>> rf = ensemble.RandomForestRegressor(max_features='auto')

Okay, so now, let's create some regression data:

>>> from sklearn import datasets
>>> X, y = datasets.make_regression(10000, 10)

Now that we have the data, we can import the cross_validation module and get access to the functions we'll use:

>>> from sklearn import cross_validation


>>> scores = cross_validation.cross_val_score(rf, X, y)


>>> print scores

[ 0.86823874  0.86763225  0.86986129]

How it works...

For the most part, this will delegate to the cross validation objects. One nice thing is that, the function will handle performing the cross validation in parallel.

We can activate verbose mode play by play:

>>> scores = cross_validation.cross_val_score(rf, X, y, verbose=3, cv=4)

[CV] no parameters to be set 
[CV] no parameters to be set, score=0.872866 -   0.7s
[CV] no parameters to be set
[CV] no parameters to be set, score=0.873679 -   0.6s
[CV] no parameters to be set 
[CV] no parameters to be set, score=0.878018 -   0.7s
[CV] no parameters to be set
[CV] no parameters to be set, score=0.871598 -   0.6s

[Parallel(n_jobs=1)]: Done   1 jobs       | elapsed:    0.7s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    2.6s finished

As we can see, during each iteration, we scored the function. We also get an idea of how long the model runs.

It's also worth knowing that we can score our function predicated on which kind of model we're trying to fit. In other recipes, we've discussed how to create your own scoring function.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset