Maximum margin classification and support vectors

The following figure depicts instances from two linearly separable classes and three possible decision boundaries. All of the decision boundaries separate the training instances of the positive class from the training instances of the negative class, and a perceptron could learn any of them. Which of these decision boundaries is most likely to perform best on test data?

Maximum margin classification and support vectors

From this visualization, it is intuitive that the dotted decision boundary is the best. The solid decision boundary is near many of the positive instances. The test set could contain a positive instance that has a slightly smaller value for the first explanatory variable, Maximum margin classification and support vectors; this instance would be classified incorrectly. The dashed decision boundary is farther away from most of the training instances; however, it is near one of the positive instances and one of the negative instances. The following figure provides a different perspective on evaluating decision boundaries:

Maximum margin classification and support vectors

Assume that the line plotted is the decision boundary for a logistic regression classifier. The instance labeled A is far from the decision boundary; it would be predicted to belong to the positive class with a high probability. The instance labeled B would still be predicted to belong to the positive class, but the probability would be lower as the instance is closer to the decision boundary. Finally, the instance labeled C would be predicted to belong to the positive class with a low probability; even a small change to the training data could change the class that is predicted. The most confident predictions are for the instances that are farthest from the decision boundary. We can estimate the confidence of the prediction using its functional margin. The functional margin of the training set is given by the following equations:

Maximum margin classification and support vectors
Maximum margin classification and support vectors

In the preceding formulae Maximum margin classification and support vectors is the true class of the instance. The functional margin is large for instance A and small for instance C. If C were misclassified, the functional margin would be negative. The instances for which the functional margin is equal to one are called support vectors. These instances alone are sufficient to define the decision boundary; the other instances are not required to predict the class of a test instance. Related to the functional margin is the geometric margin, or the maximum width of the band that separates the support vectors. The geometric margin is equal to the normalized functional margin. It is necessary to normalize the functional margins as they can be scaled by using Maximum margin classification and support vectors, which is problematic for training. When Maximum margin classification and support vectors is a unit vector, the geometric margin is equal to the functional vector. We can now formalize our definition of the best decision boundary as having the greatest geometric margin. The model parameters that maximize the geometric margin can be solved through the following constrained optimization problem:

Maximum margin classification and support vectors
Maximum margin classification and support vectors

A useful property of support vector machines is that this optimization problem is convex; it has a single local minimum that is also the global minimum. While the proof is beyond the scope of this chapter, the previous optimization problem can be written using the dual form of the model to accommodate kernels as follows:

Maximum margin classification and support vectors
Maximum margin classification and support vectors
Maximum margin classification and support vectors

Finding the parameters that maximize the geometric margin subject to the constraints that all of the positive instances have functional margins of at least 1 and all of the negative instances have functional margins of at most -1 is a quadratic programming problem. This problem is commonly solved using an algorithm called Sequential Minimal Optimization (SMO). The SMO algorithm breaks the optimization problem down into a series of the smallest possible subproblems, which are then solved analytically.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset