Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Poor man's grid search

In this recipe, we're going to introduce grid search with basic Python, though we will use sklearn for the models and matplotlib for the visualization.

Getting ready

In this recipe, we will perform the following tasks:

Design a basic search grid in the parameter space
Iterate through the grid and check the loss/score function at each point in the parameter space for the dataset
Choose the point in the parameter space that minimizes/maximizes the evaluation function

Also, the model we'll fit is a basic decision tree classifier. Our parameter space will be 2 dimensional to help us with the visualization:

The parameter space will then be the Cartesian product of the those two sets:

We'll see in a bit how we can iterate through this space with itertools.

Let's create the dataset and then get started:

>>> from sklearn import datasets
>>> X, y = datasets.make_classification(n_samples=2000, n_features=10)

How to do it...

Earlier we said that we'd use grid search to tune two parameters—criteria and max_features. We need to represent those as Python sets, and then use itertools product to iterate through them:

>>> criteria = {'gini', 'entropy'}
>>> max_features = {'auto', 'log2', None}
>>> import itertools as it
>>> parameter_space = it.product(criteria, max_features)

Great! So now that we have the parameter space, let's iterate through it and check the accuracy of each model as specified by the parameters. Then, we'll store that accuracy so that we can compare different parameter spaces. We'll also use a test and train split of 50, 50:

import numpy as np 
train_set = np.random.choice([True, False], size=len(y))
from sklearn.tree import DecisionTreeClassifier
accuracies = {}
for criterion, max_feature in parameter_space: 
    dt = DecisionTreeClassifier(criterion=criterion, 
         max_features=max_feature)
    dt.fit(X[train_set], y[train_set])
    accuracies[(criterion, max_feature)] = (dt.predict(X[~train_set]) 
                                         == y[~train_set]).mean()
>>> accuracies
{('entropy', None): 0.974609375, ('entropy', 'auto'): 0.9736328125, ('entropy', 'log2'): 0.962890625, ('gini', None): 0.9677734375, ('gini', 'auto'): 0.9638671875, ('gini', 'log2'): 0.96875}

So we now have the accuracies and its performance. Let's visualize the performance:

>>> from matplotlib import pyplot as plt
>>> from matplotlib import cm
>>> cmap = cm.RdBu_r
>>> f, ax = plt.subplots(figsize=(7, 4))
>>> ax.set_xticklabels([''] + list(criteria))
>>> ax.set_yticklabels([''] + list(max_features))
>>> plot_array = []
>>> for max_feature in max_features:
   m = []
>>> for criterion in criteria:
       m.append(accuracies[(criterion, max_feature)])
       plot_array.append(m)
>>> colors = ax.matshow(plot_array, vmin=np.min(accuracies.values()) - 
             0.001, vmax=np.max(accuracies.values()) + 0.001, cmap=cmap)
>>> f.colorbar(colors)

The following is the output:

It's fairly easy to see which one performed best here. Hopefully, you can see how this process can be taken to the further stage with a brute force method.

How it works...

This works fairly simply, we just have to perform the following steps:

Choose a set of parameters.
Iterate through them and find the accuracy of each step.
Find the best performer by visual inspection.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Poor man's grid search

Create new playlist

Sign In

Sign Up

Poor man's grid search

Getting ready

How to do it...

How it works...

Table of Contents for
Poor man's grid search