Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Forward propagation in a GRU cell

Gradient with respect to gates

Backpropagation in a GRU cell

The total loss, , is the sum of losses at all time steps, and can be given as follows:

To minimize the loss using gradient descent, we find the derivative of loss with respect to all of the weights used in the GRU cell as follows:

We have three input-to-hidden layer weights, , which are the input-to-hidden layer weights of the update gate, reset gate, and content state, respectively
We have three hidden-to-hidden layer weights, , which are the hidden-to-hidden layer weights of the update gate, reset gate, and content state respectively
We have one hidden-to-output layer weight,

We find the optimal values for all these weights through gradient descent and update the weights according to the weight update rule.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.