Chapter 8. The Perceptron

In previous chapters we discussed generalized linear models that relate a linear combination of explanatory variables and model parameters to a response variable using a link function. In this chapter, we will discuss another linear model called the perceptron. The perceptron is a binary classifier that can learn from individual training instances, which can be useful for training from large datasets. More importantly, the perceptron and its limitations inspire the models that we will discuss in the final chapters.

Invented by Frank Rosenblatt at the Cornell Aeronautical Laboratory in the late 1950's, the development of the perceptron was originally motivated by efforts to simulate the human brain. A brain is composed of cells called neurons that process information and connections between neurons called synapses through which information is transmitted. It is estimated that human brain is composed of as many as 100 billion neurons and 100 trillion synapses. As shown in the following image, the main components of a neuron are dendrites, a body, and an axon. The dendrites receive electrical signals from other neurons. The signals are processed in the neuron's body, which then sends a signal through the axon to another neuron.

The Perceptron

An individual neuron can be thought of as a computational unit that processes one or more inputs to produce an output. A perceptron functions analogously to a neuron; it accepts one or more inputs, processes them, and returns an output. It may seem that a model of just one of the hundreds of billions of neurons in the human brain will be of limited use. To an extent that is true; the perceptron cannot approximate some basic functions. However, we will still discuss perceptrons for two reasons. First, perceptrons are capable of online, error-driven learning; the learning algorithm can update the model's parameters using a single training instance rather than the entire batch of training instances. Online learning is useful for learning from training sets that are too large to be represented in memory. Second, understanding how the perceptron works is necessary to understand some of the more powerful models that we will discuss in subsequent chapters, including support vector machines and artificial neural networks. Perceptrons are commonly visualized using a diagram like the following one:

The Perceptron

The circles labeled The Perceptron, The Perceptron, and The Perceptron are inputs units. Each input unit represents one feature. Perceptrons frequently use an additional input unit that represents a constant bias term, but this input unit is usually omitted from diagrams. The circle in the center is a computational unit or the neuron's body. The edges connecting the input units to the computational unit are analogous to dendrites. Each edge is weighted, or associated with a parameter. The parameters can be interpreted easily; an explanatory variable that is correlated with the positive class will have a positive weight, and an explanatory variable that is correlated with the negative class will have a negative weight. The edge directed away from the computational unit returns the output and can be thought of as the axon.

Activation functions

The perceptron classifies instances by processing a linear combination of the explanatory variables and the model parameters using an activation function as shown in the following equation. The linear combination of the parameters and inputs is sometimes called the perceptron's preactivation.

Activation functions

Here, Activation functions are the model's parameters, Activation functions is a constant bias term, and Activation functions is the activation function. Several different activation functions are commonly used. Rosenblatt's original perceptron used the Heaviside step function. Also called the unit step function, the Heaviside step function is shown in the following equation, where Activation functions is the weighted combination of the features:

Activation functions

If the weighted sum of the explanatory variables and the bias term is greater than zero, the activation function returns one and the perceptron predicts that the instance is the positive class. Otherwise, the function returns zero and the perceptron predicts that the instance is the negative class. The Heaviside step activation function is plotted in the following figure:

Activation functions

Another common activation function is the logistic sigmoid activation function. The gradients for this activation function can be calculated efficiently, which will be important in later chapters when we construct artificial neural networks. The logistic sigmoid activation function is given by the following equation, where Activation functions is the sum of the weighted inputs:

Activation functions

This model should seem familiar; it is a linear combination of the values of the explanatory variables and the model parameters processed through the logistic function. That is, this is identical to the model for logistic regression. While a perceptron with a logistic sigmoid activation function has the same model as logistic regression, it learns its parameters differently.

The perceptron learning algorithm

The perceptron learning algorithm begins by setting the weights to zero or to small random values. It then predicts the class for a training instance. The perceptron is an error-driven learning algorithm; if the prediction is correct, the algorithm continues to the next instance. If the prediction is incorrect, the algorithm updates the weights. More formally, the update rule is given by the following:

The perceptron learning algorithm

For each training instance, the value of the parameter for each explanatory variable is incremented by The perceptron learning algorithm, where The perceptron learning algorithm is the true class for instance The perceptron learning algorithm, The perceptron learning algorithm is the predicted class for instance The perceptron learning algorithm, The perceptron learning algorithm is the value of the The perceptron learning algorithm explanatory variable for instance The perceptron learning algorithm, and The perceptron learning algorithm is a hyperparameter that controls the learning rate. If the prediction is correct, The perceptron learning algorithm equals zero, and the The perceptron learning algorithm term equals zero. So, if the prediction is correct, the weight is not updated. If the prediction is incorrect, the weight is incremented by the product of the learning rate, The perceptron learning algorithm, and the value of the feature.

This update rule is similar to the update rule for gradient descent in that the weights are adjusted towards classifying the instance correctly and the size of the update is controlled by a learning rate. Each pass through the training instances is called an epoch. The learning algorithm has converged when it completes an epoch without misclassifying any of the instances. The learning algorithm is not guaranteed to converge; later in this chapter, we will discuss linearly inseparable datasets for which convergence is impossible. For this reason, the learning algorithm also requires a hyperparameter that specifies the maximum number of epochs that can be completed before the algorithm terminates.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.