Gradient descent

Up until now, we have covered the different kind of neurons based on the activation functions that are used. We have covered the ways to quantify inaccuracy in the output of a neuron using cost functions. Now, we need a mechanism to take that inaccuracy and remedy it.

The mechanism through which the network can learn to output values closer to the expected or desired output is called gradient descent. Gradient descent is a common approach in machine learning for finding the lowest cost possible.

To understand gradient descent, let's use the single neuron equation we have been using so far:

Here, the following applies:

  • x is the input
  • w is the weight of the input
  • b is the bias of the input

Gradient descent can be represented as follows:

Initially, the neuron starts by assigning random values for w and b. From that point onward, the neuron needs to adjust the values of w and b so that it lowers or decreases the error or cost (cross entropy).

Taking the derivative of the cross entropy (cost function) results in a step-by-step change in w and b in the direction of the lowest cost possible. In other words, gradient descent tries to find the finest line between the network output and expected output.

The weights are adjusted based on a parameter called the learning rate. The learning rate is the value that is adjusted to the weight of the neuron to get an output closer to the expected output.

Keep in mind that here, we have used only a single parameter; this is only to make things easier to comprehend. In reality, there are thousands upon millions of parameters that are taken into consideration to lower the cost.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset