Gradients with respect to U

Let's calculate the gradients of loss with respect to hidden-to-input layer weights for all the gates and the candidate state. Computing gradients of loss with respect to is exactly the same as the gradients we computed with respect to , except that the last term will be instead of . Let's examine what we mean by that.

Let's find out the gradients of loss with respect to .

The input gate equation is as follows:

Thus, using the chain rule, we can write the following:

Let's calculate each of the terms in the preceding equation. We already know the first term from equation (2). So, the second term can be computed as follows:

Thus, our final equation for calculating the gradient of loss with respect to becomes the following:

As you can see, the preceding equation is exactly the same as , except that the last term is instead of . This applies for all other weights, so we can directly write the equations as follows:

  • Gradients of loss with respect to :

  • Gradients of loss with respect to :

  • Gradients of loss with respect to :

After the computing gradients, with respect to all of these weights, we update them using the weight update rule and minimize the loss.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset