Feedforward and feedback artificial neural networks

Artificial neural networks are described by three components. The first is the model's architecture, or topology, which describes the layers of neurons and structure of the connections between them. The second component is the activation function used by the artificial neurons. The third component is the learning algorithm that finds the optimal values of the weights.

There are two main types of artificial neural networks. Feedforward neural networks are the most common type of neural net, and are defined by their directed acyclic graphs. Signals only travel in one direction—towards the output layer—in feedforward neural networks. Conversely, feedback neural networks, or recurrent neural networks, do contain cycles. The feedback cycles can represent an internal state for the network that can cause the network's behavior to change over time based on its input. Feedforward neural networks are commonly used to learn a function to map an input to an output. The temporal behavior of feedback neural networks makes them suitable for processing sequences of inputs. Because feedback neural networks are not implemented in scikit-learn, we will limit our discussion to only feedforward neural networks.

Multilayer perceptrons

The multilayer perceptron (MLP) is the one of the most commonly used artificial neural networks. The name is a slight misnomer; a multilayer perceptron is not a single perceptron with multiple layers, but rather multiple layers of artificial neurons that can be perceptrons. The layers of the MLP form a directed, acyclic graph. Generally, each layer is fully connected to the subsequent layer; the output of each artificial neuron in a layer is an input to every artificial neuron in the next layer towards the output. MLPs have three or more layers of artificial neurons.

The input layer consists of simple input neurons. The input neurons are connected to at least one hidden layer of artificial neurons. The hidden layer represents latent variables; the input and output of this layer cannot be observed in the training data. Finally, the last hidden layer is connected to an output layer. The following diagram depicts the architecture of a multilayer perceptron with three layers. The neurons labeled +1 are bias neurons and are not depicted in most architecture diagrams.

Multilayer perceptrons

The artificial neurons, or units, in the hidden layer commonly use nonlinear activation functions such as the hyperbolic tangent function and the logistic function, which are given by the following equations:

Multilayer perceptrons
Multilayer perceptrons

As with other supervised models, our goal is to find the values of the weights that minimize the value of a cost function. The mean squared error cost function is commonly used with multilayer perceptrons. It is given by the following equation, where m is the number of training instances:

Multilayer perceptrons

Minimizing the cost function

The backpropagation algorithm is commonly used in conjunction with an optimization algorithm such as gradient descent to minimize the value of the cost function. The algorithm takes its name from a portmanteau of backward propagation, and refers to the direction in which errors flow through the layers of the network. Backpropagation can theoretically be used to train a feedforward network with any number of hidden units arranged in any number of layers, though computational power constrains this capability.

Backpropagation is similar to gradient descent in that it uses the gradient of the cost function to update the values of the model parameters. Unlike the linear models we have previously seen, neural nets contain hidden units that represent latent variables; we can't tell what the hidden units should do from the training data. If we do not know what the hidden units should do, we cannot calculate their errors and we cannot calculate the gradient of cost function with respect to their weights. A naive solution to overcome this is to randomly perturb the weights for the hidden units. If a random change to one of the weights decreases the value of the cost function, we save the change and randomly change the value of another weight. An obvious problem with this solution is its prohibitive computational cost. Backpropagation provides a more efficient solution.

We will step through training a feedforward neural network using backpropagation. This network has two input units, two hidden layers that both have three hidden units, and two output units. The input units are both fully connected to the first hidden layer's units, called Hidden1, Hidden2, and Hidden3. The edges connecting the units are initialized to small random weights.

Forward propagation

During the forward propagation stage, the features are input to the network and fed through the subsequent layers to produce the output activations. First, we compute the activation for the unit Hidden1. We find the weighted sum of input to Hidden1, and then process the sum with the activation function. Note that Hidden1 receives a constant input from a bias unit that is not depicted in the diagram in addition to the inputs from the input units. In the following diagram, Forward propagation is the activation function:

Forward propagation

Next, we compute the activation for the second hidden unit. Like the first hidden unit, it receives weighted inputs from both of the input units and a constant input from a bias unit. We then process the weighted sum of the inputs, or preactivation, with the activation function as shown in the following figure:

Forward propagation

We then compute the activation for Hidden3 in the same manner:

Forward propagation

Having computed the activations of all of the hidden units in the first layer, we proceed to the second hidden layer. In this network, the first hidden layer is fully connected to the second hidden layer. Similar to the units in the first hidden layer, the units in the second hidden layer receive a constant input from bias units that are not depicted in the diagram. We proceed to compute the activation of Hidden4:

Forward propagation

We next compute the activations of Hidden5 and Hidden6. Having computed the activations of all of the hidden units in the second hidden layer, we proceed to the output layer in the following figure. The activation of Output1 is the weighted sum of the second hidden layer's activations processed through an activation function. Similar to the hidden units, the output units both receive a constant input from a bias unit:

Forward propagation

We calculate the activation of Output2 in the same manner:

Forward propagation

We computed the activations of all of the units in the network, and we have now completed forward propagation. The network is not likely to approximate the true function well using the initial random values of the weights. We must now update the values of the weights so that the network can better approximate our function.

Backpropagation

We can calculate the error of the network only at the output units. The hidden units represent latent variables; we cannot observe their true values in the training data and thus, we have nothing to compute their error against. In order to update their weights, we must propagate the network's errors backwards through its layers. We will begin with Output1. Its error is equal to the difference between the true and predicted outputs, multiplied by the partial derivative of the unit's activation:

Backpropagation

We then calculate the error of the second output unit:

Backpropagation

We computed the errors of the output layer. We can now propagate these errors backwards to the second hidden layer. First, we will compute the error of hidden unit Hidden4. We multiply the error of Output1 by the value of the weight connecting Hidden4 and Output1. We similarly weigh the error of Output2. We then add these errors and calculate the product of their sum and the partial derivative of Hidden4:

Backpropagation

We similarly compute the errors of Hidden5:

Backpropagation

We then compute the Hidden6 error in the following figure:

Backpropagation

We calculated the error of the second hidden layer with respect to the output layer. Next, we will continue to propagate the errors backwards towards the input layer. The error of the hidden unit Hidden1 is the product of its partial derivative and the weighted sums of the errors in the second hidden layer:

Backpropagation

We similarly compute the error for hidden unit Hidden2:

Backpropagation

We similarly compute the error for Hidden3:

Backpropagation

We computed the errors of the first hidden layer. We can now use these errors to update the values of the weights. We will first update the weights for the edges connecting the input units to Hidden1 as well as the weight for the edge connecting the bias unit to Hidden1. We will increment the value of the weight connecting Input1 and Hidden1 by the product of the learning rate, error of Hidden1, and the value of Input1.

We will similarly increment the value of Weight2 by the product of the learning rate, error of Hidden1, and the value of Input2. Finally, we will increment the value of the weight connecting the bias unit to Hidden1 by the product of the learning rate, error of Hidden1, and one.

Backpropagation

We will then update the values of the weights connecting hidden unit Hidden2 to the input units and the bias unit using the same method:

Backpropagation

Next, we will update the values of the weights connecting the input layer to Hidden3:

Backpropagation

Since the values of the weights connecting the input layer to the first hidden layer is updated, we can continue to the weights connecting the first hidden layer to the second hidden layer. We will increment the value of Weight7 by the product of the learning rate, error of Hidden4, and the output of Hidden1. We continue to similarly update the values of weights Weight8 to Weight15:

Backpropagation

The weights for Hidden5 and Hidden6 are updated in the same way. We updated the values of the weights connecting the two hidden layers. We can now update the values of the weights connecting the second hidden layer and the output layer. We increment the values of weights W16 through W21 using the same method that we used for the weights in the previous layers:

Backpropagation
Backpropagation

After incrementing the value of Weight21 by the product of the learning rate, error of Output2, and the activation of Hidden6, we have finished updating the values of the weights for the network. We can now perform another forward pass using the new values of the weights; the value of the cost function produced using the updated weights should be smaller. We will repeat this process until the model converges or another stopping criterion is satisfied. Unlike the linear models we have discussed, backpropagation does not optimize a convex function. It is possible that backpropagation will converge on parameter values that specify a local, rather than global, minimum. In practice, local optima are frequently adequate for many applications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset