We've looked at the using cross validation iterators that scikit-learn comes with, but we can also use a helper function to perform cross validation for use automatically. This is similar to how other objects in scikit-learn are wrapped by helper functions, pipeline for instance.
First, we'll need to create a sample classifier; this can really be anything, a decision tree, a random forest, whatever. For us, it'll be a random forest. We'll then create a dataset and use the cross validation functions.
First import the ensemble
module and we'll get started:
>>> from sklearn import ensemble >>> rf = ensemble.RandomForestRegressor(max_features='auto')
Okay, so now, let's create some regression data:
>>> from sklearn import datasets >>> X, y = datasets.make_regression(10000, 10)
Now that we have the data, we can import the cross_validation
module and get access to the functions we'll use:
>>> from sklearn import cross_validation >>> scores = cross_validation.cross_val_score(rf, X, y) >>> print scores [ 0.86823874 0.86763225 0.86986129]
For the most part, this will delegate to the cross validation objects. One nice thing is that, the function will handle performing the cross validation in parallel.
We can activate verbose mode play by play:
>>> scores = cross_validation.cross_val_score(rf, X, y, verbose=3, cv=4) [CV] no parameters to be set [CV] no parameters to be set, score=0.872866 - 0.7s [CV] no parameters to be set [CV] no parameters to be set, score=0.873679 - 0.6s [CV] no parameters to be set [CV] no parameters to be set, score=0.878018 - 0.7s [CV] no parameters to be set [CV] no parameters to be set, score=0.871598 - 0.6s [Parallel(n_jobs=1)]: Done 1 jobs | elapsed: 0.7s [Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 2.6s finished
As we can see, during each iteration, we scored the function. We also get an idea of how long the model runs.
It's also worth knowing that we can score our function predicated on which kind of model we're trying to fit. In other recipes, we've discussed how to create your own scoring function.