Chapter 10. From the Perceptron to Artificial Neural Networks

In Chapter 8, The Perceptron, we introduced the perceptron, which is a linear model for binary classification. You learned that the perceptron is not a universal function approximator; its decision boundary must be a hyperplane. In the previous chapter we introduced the support vector machine, which redresses some of the perceptron's limitations by using kernels to efficiently map the feature representations to a higher dimensional space in which the instances are linearly separable. In this chapter, we will discuss artificial neural networks, which are powerful nonlinear models for classification and regression that use a different strategy to overcome the perceptron's limitations.

If the perceptron is analogous to a neuron, an artificial neural network, or neural net, is analogous to a brain. As billions of neurons with trillions of synapses comprise a human brain, an artificial neural network is a directed graph of perceptrons or other artificial neurons. The graph's edges are weighted; these weights are the parameters of the model that must be learned.

Entire books describe individual aspects of artificial neural networks; this chapter will provide an overview of their structure and training. At the time of writing, some artificial neural networks have been developed for scikit-learn, but they are not available in Version 0.15.2. Readers can follow the examples in this chapter by checking out a fork of scikit-learn 0.15.1 that includes the neural network module. The implementations in this fork are likely to be merged into future versions of scikit-learn without any changes to the API described in this chapter.

Nonlinear decision boundaries

Recall from Chapter 8, The Perceptron, that while some Boolean functions such as AND, OR, and NAND can be approximated by the perceptron, the linearly inseparable function XOR cannot, as shown in the following plots:

Nonlinear decision boundaries

Let's review XOR in more detail to develop an intuition for the power of artificial neural networks. In contrast to AND, which outputs 1 when both of its inputs are equal to 1, and OR, which outputs 1 when at least one of the inputs are equal to 1, the output of XOR is 1 when exactly one of its inputs are equal to 1. We could view XOR as outputting 1 when two conditions are true. The first condition is that at least one of the inputs must be equal to 1; this is the same condition that OR tests. The second condition is that not both of the inputs are equal to 1; NAND tests this condition. We can produce the same output as XOR by processing the input with both OR and NAND and then verifying that the outputs of both functions are equal to 1 using AND. That is, the functions OR, NAND, and AND can be composed to produce the same output as XOR.

The following tables provide the truth tables for XOR, OR, AND, and NAND for the inputs A and B. From these tables we can verify that inputting the output of OR and NAND to AND produces the same output as inputting A and B to XOR:

A

B

A AND B

A NAND B

A OR B

A XOR B

0

0

0

1

0

0

0

1

0

1

1

1

1

0

0

1

1

1

1

1

1

0

1

0

A

B

A OR B

A NAND B

(A OR B) AND (A NAND B)

0

0

0

1

0

0

1

1

1

1

1

0

1

1

1

1

1

1

0

0

Instead of trying to represent XOR with a single perceptron, we will build an artificial neural network from multiple artificial neurons that each approximate a linear function. Each instance's feature representation will be input to two neurons; one neuron will represent NAND and the other will represent OR. The output of these neurons will be received by a third neuron that represents AND to test whether both of XOR's conditions are true.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset