Forward propagation in ANN

In this section, we will see how an ANN learns where neurons are stacked up in layers. The number of layers in a network is equal to the number of hidden layers plus the number of output layers. We don't take the input layer into account when calculating the number of layers in a network. Consider a two-layer neural network with one input layer, , one hidden layer, , and one output layer, , as shown in the following diagram:

Let's consider we have two inputs,  and , and we have to predict the output, . Since we have two inputs, the number of neurons in the input layer will be two. We set the number of neurons in the hidden layer to four, and, the number of neurons in the output layer to one. Now, the inputs will be multiplied by weights, and then we add bias and propagate the resultant value to the hidden layer where the activation function will be applied.

Before that, we need to initialize the weight matrix. In the real world, we don't know which input is more important than the other so that we can weight them and compute the output. Therefore, we will randomly initialize weights and the bias value. The weight and the bias value between the input to the hidden layer are represented by and , respectively. What about the dimensions of the weight matrix? The dimensions of the weight matrix must be number of neurons in the current layer x number of neurons in the next layer. Why is that?

Because it is a basic matrix multiplication rule. To multiply any two matrices, AB, the number of columns in matrix A must be equal to the number of rows in matrix B. So, the dimension of the weight matrix, , should be number of neurons in the input layer x number of neurons in the hidden layer, that is, 2 x 4:

The preceding equation represents, . Now, this is passed to the hidden layer. In the hidden layer, we apply an activation function to . Let's use the sigmoid activation function. Then, we can write:

After applying the activation function, we again multiply result  by a new weight matrix and add a new bias value that is flowing between the hidden layer and the output layer. We can denote this weight matrix and bias as and , respectively. The dimension of the weight matrix, , will be the number of neurons in the hidden layer x the number of neurons in the output layer. Since we have four neurons in the hidden layer and one neuron in the output layer, the matrix dimension will be 4 x 1. So, we multiply by the weight matrix,, and add bias, , and pass the result to the next layer, which is the output layer:

Now, in the output layer, we apply a sigmoid function to , which will result an output value:

This whole process from the input layer to the output layer is known as forward propagation. Thus, in order to predict the output value, inputs are propagated from the input layer to the output layer. During this propagation, they are multiplied by their respective weights on each layer and an activation function is applied on top of them. The complete forward propagation steps are given as follows:

The preceding forward propagation steps can be implemented in Python as follows:

def forward_prop(X):
z1 = np.dot(X,Wxh) + bh
a1 = sigmoid(z1)
z2 = np.dot(a1,Why) + by
y_hat = sigmoid(z2)

return y_hat

Forward propagation is cool, isn't it? But how do we know whether the output generated by the neural network is correct? We define a new function called the cost function (), also known as the loss function (), which tells us how well our neural network is performing. There are many different cost functions. We will use the mean squared error as a cost function, which can be defined as the mean of the squared difference between the actual output and the predicted output:

Here, is the number of training samples, is actual output, and is the predicted output.

Okay, so we learned that a cost function is used for assessing our neural network; that is, it tells us how good our neural network is at predicting the output. But the question is where is our network actually learning? In forward propagation, the network is just trying to predict the output. But how does it learn to predict the correct output? In the next section, we will examine this.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset