Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Maximum margin classification and support vectors

The following figure depicts instances from two linearly separable classes and three possible decision boundaries. All of the decision boundaries separate the training instances of the positive class from the training instances of the negative class, and a perceptron could learn any of them. Which of these decision boundaries is most likely to perform best on test data?

From this visualization, it is intuitive that the dotted decision boundary is the best. The solid decision boundary is near many of the positive instances. The test set could contain a positive instance that has a slightly smaller value for the first explanatory variable, ; this instance would be classified incorrectly. The dashed decision boundary is farther away from most of the training instances; however, it is near one of the positive instances and one of the negative instances. The following figure provides a different perspective on evaluating decision boundaries:

Assume that the line plotted is the decision boundary for a logistic regression classifier. The instance labeled A is far from the decision boundary; it would be predicted to belong to the positive class with a high probability. The instance labeled B would still be predicted to belong to the positive class, but the probability would be lower as the instance is closer to the decision boundary. Finally, the instance labeled C would be predicted to belong to the positive class with a low probability; even a small change to the training data could change the class that is predicted. The most confident predictions are for the instances that are farthest from the decision boundary. We can estimate the confidence of the prediction using its functional margin. The functional margin of the training set is given by the following equations:

In the preceding formulae is the true class of the instance. The functional margin is large for instance A and small for instance C. If C were misclassified, the functional margin would be negative. The instances for which the functional margin is equal to one are called support vectors. These instances alone are sufficient to define the decision boundary; the other instances are not required to predict the class of a test instance. Related to the functional margin is the geometric margin, or the maximum width of the band that separates the support vectors. The geometric margin is equal to the normalized functional margin. It is necessary to normalize the functional margins as they can be scaled by using , which is problematic for training. When is a unit vector, the geometric margin is equal to the functional vector. We can now formalize our definition of the best decision boundary as having the greatest geometric margin. The model parameters that maximize the geometric margin can be solved through the following constrained optimization problem:

A useful property of support vector machines is that this optimization problem is convex; it has a single local minimum that is also the global minimum. While the proof is beyond the scope of this chapter, the previous optimization problem can be written using the dual form of the model to accommodate kernels as follows:

Finding the parameters that maximize the geometric margin subject to the constraints that all of the positive instances have functional margins of at least 1 and all of the negative instances have functional margins of at most -1 is a quadratic programming problem. This problem is commonly solved using an algorithm called Sequential Minimal Optimization (SMO). The SMO algorithm breaks the optimization problem down into a series of the smallest possible subproblems, which are then solved analytically.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Maximum margin classification and support vectors

Create new playlist

Sign In

Sign Up

Maximum margin classification and support vectors

Table of Contents for
Maximum margin classification and support vectors