Working details of a simple neural network

In order to understand how neural networks work, we will build a very simple network. The input and the expected output are as follows:

import numpy as np
x=np.array([[1,2],[3,4]])
y=np.array([0,1])

Note that x is the input dataset with two variables for each of the two rows. y is the expected output for the two inputs.

Essentially, we have the input and output layers in place.

As an example, for one of the preceding data points, the input and the output values of the network will look like this:

In traditional machine learning, you would find the relation directly between the input and output values. However, the neural network architecture works with the following intuition:

"The input values can be represented in a richer (higher) dimensional space. The more the dimensions in which the input values are represented, the more is the complexity in the input dataset captured."

With the preceding intuition, let's build a hidden layer with three units in a neural network:

Now that the layer is built, let's make connections between each unit, as follows:

Now that a connection between each unit is made, there will be a certain amount of weightage that is associated with each connection. In the following diagram, we will initialize the weight that each connection represents:

Note that the weights W represent the strength of connection.

Now we have built a simple neural network. Let's randomly initialize the weight values between the input and hidden layers to understand how the hidden layer values are computed:

Hidden layer values are computed as the multiplications of the input values and weights associated with them, as follows:

h1 = 1*1 + 2*(2) = 5

h2 = 1*0.5 + 2*(-1) = -1.5

h3 = 1*(-0.2) + 2*0.1 = 0

Now that the hidden values are calculated, we pass them through an activation function. The intuition for an activation function is as follows:

"The neural network in the state that we presented previously (without an activation function) is a big linear combination of input variables. Nonlinearity can only be obtained by performing an activation on top of the hidden layer values."

For simplicity, as of now, we will assume that the nonlinearity that we are going to apply is the sigmoid function.

A sigmoid function works as follows:

  • It takes an input value, x, and transforms into a new value, 1/(1+exp(-x))

The nonlinearity of a sigmoid curve looks like this for various values of x:

Thus, the hidden layer values, which were 5, -1.5, and 0, are transformed to 0.99, 0.18, and 0.5:

Now that the hidden layer values are computed, let's initialize the weights connecting the hidden layer to the output layer.

Note that again the weights are initialized randomly:

Now that the weights are initialized, let's calculate the value associated with the output layer:

0.99*1 + 0.18*(-1) + 0.5*0.2 = 0.91

The expected value at the output layer is 0.91, while the actual value is 0.

Thus, the loss associated in this scenario is (0.91 - 0)^2 = 0.83.

The process until now, where we calculate the loss corresponding to the weight values, is called the feedforward process.

So far, in this section, we have understood:

  • Weights
  • Activation function
  • Loss calculation

In the preceding scenario, while the loss function remains constant for a given objective that we try to solve, the weight initialization and activation functions can vary for different network architectures.

The objective for the problem laid out just now would be to minimize the loss corresponding to a network architecture by iteratively varying the weights.

For example, in the preceding architecture, the loss can be reduced by changing the final weight from the hidden layer to the output layer connection from 0.2 to 0.1. Once the weight is changed, the loss reduces from 0.83 to 0.74.

The process by which weights are changed iteratively to minimize the loss value is called backpropagation.

The number of times a weight change happens per given dataset is called the epoch. Essentially, an epoch constitutes feedforward and backpropagation.

One of the techniques to intelligently arrive at the optimal weight values is called gradient descent—more on various weight optimizers in a later section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset