Using Stochastic Gradient Descent for classification

As was discussed in Chapter 2, Working with Linear Models, Stochastic Gradient Descent is a fundamental technique to fit a model for regression. There are natural connections between the two techniques, as the name so obviously implies.

Getting ready

In regression, we minimized a cost function that penalized for bad choices on a continuous scale, but for classification, we'll minimize a cost function that penalizes for two (or more) cases.

How to do it…

First, let's create some very basic data:

>>> from sklearn import datasets
>>> X, y = datasets.make_classification()

Next, we'll create a SGDClassifier instance:

>>> from sklearn import linear_model
>>> sgd_clf = linear_model.SGDClassifier()

As usual, we'll fit the model:

>>>, y)
SGDClassifier(alpha=0.0001, class_weight=None, epsilon=0.1, eta0=0.0,
              fit_intercept=True, l1_ratio=0.15, 
              learning_rate='optimal', loss='hinge', n_iter=5, 
              n_jobs=1, penalty='l2', power_t=0.5, random_state=None, 
              shuffle=False, verbose=0, warm_start=False)

We can set the class_weight parameter to account for the varying amounts of unbalance in a dataset.

The Hinge loss function is defined as follows:

Here, t is the true classification denoted as +1 for one case and -1 for the other. The vector of coefficients is denoted by y as fit from the model, and x is the value of interest. There is also an intercept for good measure. To put it another way:

