Binary classification with the perceptron

Let's work through a toy classification problem. Suppose that you wish to separate adult cats from kittens. Only two explanatory variables are available in your dataset: the proportion of the day that the animal was asleep and the proportion of the day that the animal was grumpy. Our training data consists of the following four instances:

Instance

Proportion of the day spent sleeping

Proportion of the day spent being grumpy

Kitten or Adult?

1

0.2

0.1

Kitten

2

0.4

0.6

Kitten

3

0.5

0.2

Kitten

4

0.7

0.9

Adult

The following scatter plot of the instances confirms that they are linearly separable:

Binary classification with the perceptron

Our goal is to train a perceptron that can classify animals using the two real-valued explanatory variables. We will represent kittens with the positive class and adult cats with the negative class. The preceding network diagram describes the perceptron that we will train.

Our perceptron has three input units. Binary classification with the perceptron is the input unit for the bias term. Binary classification with the perceptron and Binary classification with the perceptron are input units for the two features. Our perceptron's computational unit uses a Heaviside activation function. In this example, we will set the maximum number of training epochs to ten; if the algorithm does not converge within 10 epochs, it will stop and return the current values of the weights. For simplicity, we will set the learning rate to one. Initially, we will set all of the weights to zero. Let's examine the first training epoch, which is shown in the following table:

Epoch 1

    

Instance

Initial Weights

x

Activation

Prediction, Target

Correct

Updated weights

0

0, 0, 0;

1.0, 0.2, 0.1;

1.0*0 + 0.2*0 + 0.1*0 = 0.0;

0, 1

False

1.0, 0.2, 0.1

1

1.0, 0.2, 0.1;

1.0, 0.4, 0.6;

1.0*1.0 + 0.4*0.2 + 0.6*0.1 = 1.14;

1, 1

True

1.0, 0.2, 0.1

2

1.0, 0.2, 0.1;

1.0, 0.5, 0.2;

1.0*1.0 + 0.5*0.2 + 0.2*0.1 = 1.12;

1, 1

True

1.0, 0.2, 0.1

3

1.0, 0.2, 0.1;

1.0, 0.7, 0.9;

1.0*1.0 + 0.7*0.2 + 0.9*0.1 = 1.23;

1, 0

False

0, -0.5, -0.8

Initially, all of the weights are equal to zero. The weighted sum of the explanatory variables for the first instance is zero, the activation function outputs zero, and the perceptron incorrectly predicts that the kitten is an adult cat. As the prediction was incorrect, we update the weights according to the update rule. We increment each of the weights by the product of the learning rate, the difference between the true and predicted labels and the value of the corresponding feature.

We then continue to the second training instance and calculate the weighted sum of its features using the updated weights. This sum equals 1.14, so the activation function outputs one. This prediction is correct, so we continue to the third training instance without updating the weights. The prediction for the third instance is also correct, so we continue to the fourth training instance. The weighted sum of the features for the fourth instance is 1.23. The activation function outputs one, incorrectly predicting that this adult cat is a kitten. Since this prediction is incorrect, we increment each weight by the product of the learning rate, the difference between the true and predicted labels, and its corresponding feature. We completed the first epoch by classifying all of the instances in the training set. The perceptron did not converge; it classified half of the training instances incorrectly. The following figure depicts the decision boundary after the first epoch:

Binary classification with the perceptron

Note that the decision boundary moved throughout the epoch; the decision boundary formed by the weights at the end of the epoch would not necessarily have produced the same predictions seen earlier in the epoch. Since we have not exceeded the maximum number of training epochs, we will iterate through the instances again. The second training epoch is shown in the following table:

Epoch 2

    

Instance

Initial Weights

x

Activation

Prediction, Target

Correct

Updated weights

0

0, -0.5, -0.8

1.0, 0.2, 0.1

1.0*0 + 0.2*-0.5 + 0.1*-0.8 = -0.18

0, 1

False

1, -0.3, -0.7

1

1, -0.3, -0.7

1.0, 0.4, 0.6

1.0*1.0 + 0.4*-0.3 + 0.6*-0.7 = 0.46

1, 1

True

1, -0.3, -0.7

2

1, -0.3, -0.7

1.0, 0.5, 0.2

1.0*1.0 + 0.5*-0.3 + 0.2*-0.7 = 0.71

1, 1

True

1, -0.3, -0.7

3

1, -0.3, -0.7

1.0, 0.7, 0.9

1.0*1.0 + 0.7*-0.3 + 0.9*-0.7 = 0.16

1, 0

False

0, -1, -1.6

The second epoch begins using the values of the weights from the first epoch. Two training instances are classified incorrectly during this epoch. The weights are updated twice, but the decision boundary at the end of the second epoch is similar the decision boundary at the end of the first epoch.

Binary classification with the perceptron

The algorithm failed to converge during this epoch, so we will continue training. The following table describes the third training epoch:

Epoch 3

    

Instance

Initial Weights

x

Activation

Prediction, Target

Correct

Updated Weights

0

0, -1, -1.6

1.0, 0.2, 0.1

1.0*0 + 0.2*-1.0 + 0.1*-1.6 = -0.36

0, 1

False

1,-0.8, -1.5

1

1,-0.8, -1.5

1.0, 0.4, 0.6

1.0*1.0 + 0.4*-0.8 + 0.6*-1.5 = -0.22

0, 1

False

2, -0.4, -0.9

2

2, -0.4, -0.9

1.0, 0.5, 0.2

1.0*2.0 + 0.5*-0.4 + 0.2*-0.9 = 1.62

1, 1

True

2, -0.4, -0.9

3

2, -0.4, -0.9

1.0, 0.7, 0.9

1.0*2.0 + 0.7*-0.4 + 0.9*-0.9 = 0.91

1, 0

False

1, -1.1, -1.8

The perceptron classified more instances incorrectly during this epoch than during previous epochs. The following figure depicts the decision boundary at the end of the third epoch:

Binary classification with the perceptron

The perceptron continues to update its weights throughout the fourth and fifth training epochs, and it continues to classify training instances incorrectly. During the sixth epoch the perceptron classified all of the instances correctly; it converged on a set of weights that separates the two classes. The following table describes the sixth training epoch:

Epoch 6

    

Instance

Initial Weights

x

Activation

Prediction, Target

Correct

Updated weights

0

2, -1, -1.5

1.0, 0.2, 0.1

1.0*2 + 0.2*-1 + 0.1*-1.5 = 1.65

1, 1

True

2, -1, -1.5

1

2, -1, -1.5

1.0, 0.4, 0.6

1.0*2 + 0.4*-1 + 0.6*-1.5 = 0.70

1, 1

True

2, -1, -1.5

2

2, -1, -1.5

1.0, 0.5, 0.2

1.0*2 + 0.5*-1 + 0.2*-1.5 = 1.2

1, 1

True

2, -1, -1.5

3

2, -1, -1.5

1.0, 0.7, 0.9

1.0*2 + 0.7*-1 + 0.9*-1.5 = -0.05

0, 0

True

2, -1, -1.5

The decision boundary at the end of the sixth training epoch is shown in the following figure:

Binary classification with the perceptron

The following figure shows the decision boundary throughout all the training epochs.

Binary classification with the perceptron

Document classification with the perceptron

scikit-learn provides an implementation of the perceptron. As with the other implementations that we used, the constructor for the Perceptron class accepts keyword arguments that set the algorithm's hyperparameters. Perceptron similarly exposes the fit_transform() and predict() methods. Perceptron also provides a partial_fit() method, which allows the classifier to train and make predictions for streaming data.

In this example, we train a perceptron to classify documents from the 20 newsgroups dataset. The dataset consists of approximately 20,000 documents sampled from 20 Usenet newsgroups. The dataset is commonly used in document classification and clustering experiments; scikit-learn provides a convenience function to download and read the dataset. We will train a perceptron to classify documents from three newsgroups: rec.sports.hockey, rec.sports.baseball, and rec.auto. scikit-learn's Perceptron natively supports multiclass classification; it will use the one versus all strategy to train a classifier for each of the classes in the training data. We will represent the documents as TF-IDF-weighted bags of words. The partial_fit() method could be used in conjunction with HashingVectorizer to train from large or streaming data in a memory-constrained setting:

>>> from sklearn.datasets import fetch_20newsgroups
>>> from sklearn.metrics.metrics import f1_score, classification_report
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> from sklearn.linear_model import Perceptron

>>> categories = ['rec.sport.hockey', 'rec.sport.baseball', 'rec.autos']
>>> newsgroups_train = fetch_20newsgroups(subset='train', categories=categories, remove=('headers', 'footers', 'quotes'))
>>> newsgroups_test = fetch_20newsgroups(subset='test', categories=categories, remove=('headers', 'footers', 'quotes'))

>>> vectorizer = TfidfVectorizer()
>>> X_train = vectorizer.fit_transform(newsgroups_train.data)
>>> X_test = vectorizer.transform(newsgroups_test.data)

>>> classifier = Perceptron(n_iter=100, eta0=0.1)
>>> classifier.fit_transform(X_train, newsgroups_train.target )
>>> predictions = classifier.predict(X_test)
>>> print classification_report(newsgroups_test.target, predictions)

The following is the output of the script:

             precision    recall  f1-score   support

          0       0.89      0.87      0.88       396
          1       0.87      0.78      0.82       397
          2       0.79      0.88      0.83       399

avg / total       0.85      0.85      0.85      1192

First, we download and read the dataset using the fetch_20newsgroups() function. Consistent with other built-in datasets, the function returns an object with data, target, and target_names fields. We also specify that the documents' headers, footers, and quotes should be removed. Each of the newsgroups used different conventions in the headers and footers; retaining these explanatory variables makes classifying the documents artificially easy. We produce TF-IDF vectors using TfifdVectorizer, train the perceptron, and evaluate it on the test set. Without hyperparameter optimization, the perceptron's average precision, recall, and F1 score are 0.85.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset