Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Grid search

To mitigate this problem, we have a very useful class named GridSearchCV within the sklearn.grid_search module. What we have been doing with our calc_params function is a kind of grid search in one dimension. With GridSearchCV, we can specify a grid of any number of parameters and parameter values to traverse. It will train the classifier for each combination and obtain a cross-validation accuracy to evaluate each one.

Let's use it to adjust the C and the gamma parameters at the same time.

>>> from sklearn.grid_search import GridSearchCV

>>> parameters = {
>>>     'svc__gamma': np.logspace(-2, 1, 4),
>>>     'svc__C': np.logspace(-1, 1, 3),
>>> }
>>> clf = Pipeline([
>>>     ('vect', TfidfVectorizer(
>>>                stop_words=stop_words,
>>>                token_pattern=ur"[a-z0-9_-.]+[a-z][a-z0-
                   9_-.]+",         
>>>     )),
>>>     ('svc', SVC()),
>>> ])
>>> gs = GridSearchCV(clf, parameters, verbose=2, refit=False, cv=3)

Let's execute our grid search and print the best parameter values and scores.

>>> %time _ = gs.fit(X_train, y_train)
>>> gs.best_params_, gs.best_score_
CPU times: user 304.39 s, sys: 2.55 s, total: 306.94 s
Wall time: 306.56 s
 ({'svc__C': 10.0, 'svc__gamma': 0.10000000000000001}, 0.81166666666666665)

With the grid search, we obtained a better combination of C and gamma parameters, for values 10.0 and 0.10 respectively, with a three-fold cross-validation accuracy of 0.811, which is much better than the best value we obtained (0.76) in the previous experiment by only adjusting gamma and keeping the C value at 1.0.

At this point, we could continue performing experiments by trying not only to adjust other parameters of the SVC but also adjusting the parameters on TfidfVectorizer, which is also part of the estimator. Note that this additionally increases the complexity. As you might have noticed, the previous grid search experiment took about five minutes to finish. If we add new parameters to adjust, the time will increase exponentially. As a result, these kinds of methods are very resource/time intensive; this is also the reason why we used only a subset of the total instances.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Grid search

Create new playlist

Sign In

Sign Up

Grid search

Table of Contents for
Grid search