Gradients with respect to W

Now, we will see how to calculate the gradients of loss with respect to hidden-to-hidden layer weights, , for all the gates and the content state.

Let's calculate the gradients of loss with respect to .

Recall the equation of the reset gate, which is given as follows:

Using the chain rule, we can write the following:

Let's calculate each of the terms in the preceding equation. The first term,, we already calculated in equation (11). The second term is calculated as follows:

Thus, our final equation for calculating the gradient of loss with respect to becomes the following:

Now, let's move on to finding the gradients of loss with respect to .

Recall the equation of the update gate, which is given as follows:

Using the chain rule, we can write the following:

We have already computed the first term in equation (12). The second term is computed as follows:

Thus, our final equation for calculating the gradient of loss with respect to becomes the following:

Now, we will find the gradients of loss with respect to .

Recall the content state equation:

Using the chain rule, we can write the following:

Refer to equation (10) for the first term. The second term is given as follows:

Thus, our final equation for calculating the gradient of loss with respect to becomes the following:

Table of Contents for Gradients with respect to W

Create new playlist

Sign In

Sign Up

Table of Contents for
Gradients with respect to W