Reducing the grid search runtime

The GridSearchCV function can really manage an extensive amount of work for you by checking all combinations of parameters, as required by your grid specification. Anyway, when the data or grid search space is big, the procedure may take a long time to compute.

A potential remedy to this issue would be the following approach from the model_selection module. RandomizedSearchCV offers a procedure that randomly draws a sample of combinations and reports the best combination found.

This has some clear advantages:

  • You can limit the number of computations.
  • You can obtain a good result or, at worst, understand where to focus your efforts on in the grid search.
  • RandomizedSearchCV has the same options as GridSearchCV but:
    1. Has a n_iter parameter, which is the number of random samples.
    2. Includes param_distributions, which has the same function as that of param_grid. However, it only accepts dictionaries and it works even better if you assign distributions as values and not lists of discrete values. For instance, instead of C: [1, 10, 100, 1000], you can assign a distribution such as C:scipy.stats.expon(scale=100).

Let's test this function with our previous settings:

In: search_dict = {'kernel': ['linear','rbf'],'C': [1, 10, 100, 1000], 
'gamma': [0.001, 0.0001]}
scorer = 'accuracy'
search_func = model_selection.RandomizedSearchCV(estimator=h,
param_distributions=search_dict,
n_iter=7,
scoring=scorer,
n_jobs=-1,
iid=False,
refit=True,
cv=10,
return_train_score=False)
%timeit search_func.fit(X,y)
print (search_func.best_estimator_)
print (search_func.best_params_)
print (search_func.best_score_)

Out: 1.53 s ± 265 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
SVC(C=10, cache_size=200, class_weight=None, coef0=0.0, degree=3,
gamma=0.001, kernel='rbf', max_iter=-1, probability=False,
random_state=None, shrinking=True, tol=0.001, verbose=False)
{'kernel': 'rbf', 'C': 1000, 'gamma': 0.001}
0.981081122784

Using just half of the computations (7 draws against 14 trials with the exhaustive grid search), it found an equivalent solution. Let's also have a look at the combinations that have been tested:

In: res = search_func.cvresults
for el in zip(res['mean_test_score'],
res['std_test_score'],
res['params']):
print(el)

Out: (0.9610800248897716, 0.021913085707003094, {'kernel': 'linear',
'gamma': 0.001, 'C': 1000})
(0.9610800248897716, 0.021913085707003094, {'kernel': 'linear',
'gamma': 0.001, 'C': 1})
(0.9716408520553866, 0.02044204452092589, {'kernel': 'rbf',
'gamma': 0.0001, 'C': 1000})
(0.981081122784369, 0.015506818968315338, {'kernel': 'rbf',
'gamma': 0.001, 'C': 10})
(0.9610800248897716, 0.021913085707003094, {'kernel': 'linear',
'gamma': 0.001, 'C': 10})
(0.9610800248897716, 0.021913085707003094, {'kernel': 'linear',
'gamma': 0.0001, 'C': 1000})
(0.9694212166750269, 0.02517929728858225, {'kernel': 'rbf',
'gamma': 0.0001, 'C': 10})

Even without a complete overview of all combinations, a good sample can prompt you to look for just the RBF kernel and for certain C and gamma ranges, limiting the following grid search to a limited portion of the potential search space.

Resorting to optimization based on random processes may appear to rely on blind luck, but actually, it is a very efficient way to explore the hyperparameters' space, especially when it is a high-dimensional space. If properly arranged, random search does not sacrifice the completeness of exploration for its extent. In high-dimensional hyperparameter spaces, grid search exploration tends to repeat the testing of similar parameter combinations, proving to be computationally highly inefficient in those case where there are irrelevant parameters or parameters whose effects are very correlated.

Random Search has been devised by James Bergstra and Yoshua Bengio in order to make the search of optimal combinations of hyperparameters in deep learning more efficient. The original paper is a great source for further insight into this method: http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf.

Statistical tests have demonstrated that for a randomized search to perform well, you should try from a minimum of 30 trials to a maximum of 60 (this rule of thumb is based on the assumption that the optimum covers from 5% to 10% of the hyperparameters' space, and a 95% success rate is an acceptable one). Consequently, it generally makes sense to resort to random search if your grid searching requires a comparable (so you can take advantage of random searching's properties) or a larger number of experiments (allowing you to save on computations).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset