Getting ready

The strategy that we'll adopt is as follows: our neural network will have one hidden layer (with neurons) connecting the input layer to the output layer. Note that we have more neurons in the hidden layer than in the input layer, as we want to enable the input layer to be represented in more dimensions:

Calculating the hidden layer unit values

We now assign weights to all of the connections. Note that these weights are selected randomly (based on Gaussian distribution) since it is the first time we're forward-propagating. In this specific case, let's start with initial weights that are between 0 and 1, but note that the final weights after the training process of a neural network don't need to be between a specific set of values:

In the next step, we perform the multiplication of the input with weights to calculate the values of hidden units in the hidden layer. 

The hidden layer's unit values are obtained as follows:

The hidden layer's unit values are also shown in the following diagram:

Note that in the preceding output we calculated the hidden values. For simplicity, we excluded the bias terms that need to be added at each unit of a hidden layer.

Now, we will pass the hidden layer values through an activation function so that we attain non-linearity in our output.

If we do not apply the activation function in the hidden layer, the neural network becomes a giant linear connection from input to output.

Applying the activation function

Activation functions are applied at multiple layers of a network. They are used so that we achieve high non-linearity in input, which can be useful in modeling complex relations between the input and output.

The different activation functions are as follows:

For our example, let’s use the sigmoid function for activation. The sigmoid function looks like this, graphically:

By applying sigmoid activation, S(x), to the three hidden=layer sums, we get the following:

final_h1 = S(1.0) = 0.73

final_h2 = S(1.3) = 0.78

final_h3 = S(0.8) = 0.69

Calculating the output layered values

Now that we have calculated the hidden layer values, we will be calculating the output layer value. In the following diagram, we have the hidden layer values connected to the output through the randomly-initialized weight values. Using the hidden layer values and the weight values, we will calculate the output values for the following network:

We perform the sum product of the hidden layer values and weight values to calculate the output value. For simplicity, we excluded the bias terms that need to be added at each unit of the hidden layer:

0.73 * 0.3 + 0.79 * 0.5 + 0.69 * 0.9 = 1.235

The values are shown in the following diagram:

Because we started with a random set of weights, the value of the output neuron is very different from the target, in this case by +1.235 (since the target is 0).

Calculating the loss values

Loss values (alternatively called cost functions) are values that we optimize in a neural network. In order to understand how loss values get calculated, let's look at two scenarios:

  • Continuous variable prediction
  • Categorical variable prediction

Calculating loss during continuous variable prediction

Typically, when the variable is a continuous one, the loss value is calculated as the squared error, that is, we try to minimize the mean squared error by varying the weight values associated with the neural network:

In the preceding equation, y(i) is the actual value of output, h(x) is the transformation that we apply on the input (x) to obtain a predicted value of y, and m is the number of rows in the dataset.

Calculating loss during categorical variable prediction

When the variable to predict is a discrete one (that is, there are only a few categories in the variable), we typically use a categorical cross-entropy loss function. When the variable to predict has two distinct values within it, the loss function is binary cross-entropy, and when the variable to predict has multiple distinct values within it, the loss function is a categorical cross-entropy.

Here is binary cross-entropy:

(ylog(p)+(1−y)log(1−p))

Here is categorical cross-entropy: 

y is the actual value of output p, is the predicted value of the output  and n is the total number of data points. For now, let's assume that the outcome that we are predicting in our toy example is continuous. In that case, the loss function value is the mean squared error, which is calculated as follows:

error = 1.2352 = 1.52

In the next step, we will try to minimize the loss function value using back-propagation (which we'll learn about in the next section), where we update the weight values (which were initialized randomly earlier) to minimize the loss (error).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset