Doing binary classification using SVM

Classification is a technique to put data into different classes based on its utility. For example, an e-commerce company can apply two labels, namely will buy or will not buy, to the potential visitors.

This classification is done by providing some already labeled data to machine-learning algorithms called training data, as you know already. The challenge is how to mark the boundary between the two classes. Let's take a simple example, as shown in the following figure:

In the preceding case, we designated gray and black to the "will not buy" and "will buy" labels, respectively. Here, drawing a line between the two classes is easy, as follows:

Is this the best we can do? Not really. Let's try to do a better job. The black classifier is not really equidistant from the will buy and will not buy carts. Let's make a better attempt:

This looks good, doesn't it? This, in fact, is what the SVM algorithm does. You can see in the preceding diagram that there are only three carts that decide the slope of the line: two black carts above the line and one gray cart below the line. These carts are called support vectors, and the rest of the carts, that is, the vectors, are irrelevant.

Sometimes it's not easy to draw a line and a curve may be needed to separate two classes, such as the following:

Sometimes, even that is not enough. In such cases, we need more than two dimensions to resolve the problem. Rather than a classified line, what we need is a hyperplane. In fact, whenever data is too cluttered, adding extra dimensions will help you find a hyperplane to separate the classes. The following diagram illustrates this:

This does not mean that adding extra dimensions is always a good idea. Most of the time, our goal is to reduce dimensions and keep only the relevant dimensions/features. A whole set of algorithms is dedicated to dimensionality reduction; we will cover them in later chapters.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset