Gradients with respect to the hidden to output weight, V

 

First, let's recap the steps involved in the forward propagation:

Let's suppose that , and substituting this into equation (2), we can rewrite the preceding steps as follows:

After predicting the output , we are in the final layer of the network. Since we are backpropagating, that is, going from the output layer to the input layer, our first weight would be , which is the hidden to output layer weight.

We have seen that the final loss is the sum of the loss over all the time steps, and similarly, the final gradient is the sum of gradients over all the time steps:

Hence, we can write:

Recall our loss function, ; we cannot calculate the gradient with respect to
directly from , as there are no terms in it. So, we apply the chain rule. Recall the forward propagation equation; there is a term in :

where

First, we calculate a partial derivative of the loss with respect to , and then, from , we will calculate the partial derivative with respect to . From , we can calculate the derivative with respect to .

Thus, our equation becomes the following:

As we know that, , the gradient of loss with respect to can be calculated as follows:

Substituting equation (4) in equation (3), we can write the following:

For better understanding, let's take each of the terms from the preceding equation and compute them one by one:

From equation (1), we can substitute the value of in the preceding equation (6) as follows:

Now, we will compute the term . Since we know, , computing gives us the derivative of the softmax function:

The derivative of the softmax function can be represented as follows:

Substituting equation (8) into equation (7), we can write the following:

Thus, the final equation becomes:

Now, we can substitute equation (9) into equation (5):

Since we know that , we can write:

Substituting the preceding equation into equation (10), we get our final equation, that is, gradient of the loss function with respect to , as follows:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset