Hyperparameter optimization

A machine learning hypothesis is not simply determined by the learning algorithm but also by its hyperparameters (the parameters of the algorithm that have to be fixed prior, and which cannot be learned during the training process) and the selection of variables to be used to achieve the best learned parameters.

In this section, we will explore how to extend the cross-validation approach to find the best hyperparameters that are able to generalize to our test set. We will keep on using the handwritten digits dataset offered by the Scikit-learn package. Here's a useful reminder about how to load the dataset:

In: from sklearn.datasets import load_digits
    digits = load_digits()
    X, y = digits.data, digits.target

In addition, we will keep on using support vector machines as our learning algorithm:

In: from sklearn import svm
    h = svm.SVC()
    hp = svm.SVC(probability=True, random_state=1)

This time, we will work with two hypotheses. The first hypothesis is just the plain SVC that outputs a label as a prediction. The second hypothesis is SVC enhanced by the computation of label probabilities (the probability=True parameter) with the random_state fixed to the value 1 in order to guarantee the reproducibility of the results. SVC outputting probabilities can be evaluated by all of the loss metrics that require a probability and not a label prediction as a result, such as AUC.

After running the preceding code snippet, we are ready to import the model_selection module and set the list of hyperparameters that we want to test by cross-validation.

We are going to use the GridSearchCV function, which will automatically search for the best parameters according to a search schedule and score the results with respect to a predefined or custom scoring function:

In: from sklearn import model_selection
    search_grid = [
          {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
          {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 
           'kernel': ['rbf']},
          ]
    scorer = 'accuracy'

Now, we have imported the module, set the scorer variable using a string parameter ('accuracy'), and created a list made of two dictionaries.

The scorer is a string that we chose from a range of possible ones that you can find in the predefined values section of the Scikit-learn documentation, which can be viewed at Scikit-learn.org/stable/modules/model_evaluation.html.

Using predefined values just requires you to pick an evaluation metric from the list (there are some for classification and regression, and there are some for clustering) and use the string by plugging it directly, or by using a string variable, into the GridSearchCV function.

GridSearchCV also accepts a parameter called param_grid, which can be a dictionary containing, as keys, an indication of all the hyperparameters to be changed and, as values referring to the dictionary keys, lists of parameters to be tested. Therefore, if you want to test the performances of your hypothesis with respect to the hyperparameter C, you can create a dictionary like this:

{'C' : [1, 10, 100, 1000]}

Alternatively, according to your preference, you can use a specialized NumPy function to generate numbers that are evenly spaced on a log scale (like we saw in the previous chapter):

{'C' :np.logspace(start=-2, stop=3, num=6, base=10.0)}

You can, therefore, enumerate all of the possible parameters' values and test all of their combinations. However, you can also stack different dictionaries, having each dictionary containing only a portion of the parameters that should be tested together. For example, when working with SVC, the kernel set to linear automatically excludes the gamma parameter. Combining it with the linear kernel would be, in fact, a waste of computational power since it would not have any effect on the learning process.

Now, let's proceed with the grid search, timing it (thanks to the %timeit command magic command) to know how much time it will take to complete the entire procedure:

In: search_func = model_selection.GridSearchCV(estimator=h,  
                                param_grid=search_grid, scoring=scorer, 
                                n_jobs=-1, iid=False, refit=True, cv=10)
    %timeit search_func.fit(X,y)
    print (search_func.best_estimator_)
    print (search_func.best_params_)
    print (search_func.best_score_)

Out: 4.52 s ± 75.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
     SVC(C=10, cache_size=200, class_weight=None, coef0=0.0, degree=3, 
     gamma=0.001,
       kernel='rbf', max_iter=-1, probability=False, random_state=None,
       shrinking=True, tol=0.001, verbose=False)
     {'kernel': 'rbf', 'C': 10, 'gamma': 0.001}
     0.981081122784

It took about 10 seconds to complete the search on our computer. The search pointed out that the best solution is an support vector machine classifier with rbf kernel, C=10, and gamma=0.001 with a cross-validated mean accuracy of 0.981.

As for the GridSearchCV command, apart from our hypothesis (the estimator parameter), param_grid, and the scoring we just talked about, we decided to set other optional but useful parameters:

First, we will set n_jobs=-1. This forces the function to use all the processors available on the computer, and so we run the Jupyter cell.
We will then set refit=True so that the function fits the whole training set, using the best estimator's parameters. Now, we just need to apply the search_funct.predict() method to fresh data in order to obtain new predictions.
The cv parameter is set to 10 folds (however, you can go for a smaller number, trading off speed with the accuracy of testing).

The iid parameter is set to False. This parameter decides how to compute the error measure with respect to the classes. If the classes are balanced (as in this case), setting iid won't have much effect. However, if they are unbalanced, by default, iid=True will make the classes with more examples weigh more on the computation of the global error. Instead, iid=False means that all the classes should be considered the same. Since we wanted SVC to recognize every handwritten number from 0 to 9, no matter how many examples were given for each of them, setting the iid parameter to False is the right choice. According to your data science project, you may decide that you actually prefer the default being set to True.

Table of Contents for Hyperparameter optimization

Create new playlist

Sign In

Sign Up

Table of Contents for
Hyperparameter optimization