Backpropagation in a GRU cell

The total loss, , is the sum of losses at all time steps, and can be given as follows:

To minimize the loss using gradient descent, we find the derivative of loss with respect to all of the weights used in the GRU cell as follows:

  • We have three input-to-hidden layer weights, , which are the input-to-hidden layer weights of the update gate, reset gate, and content state, respectively
  • We have three hidden-to-hidden layer weights, , which are the hidden-to-hidden layer weights of the update gate, reset gate, and content state respectively
  • We have one hidden-to-output layer weight,

We find the optimal values for all these weights through gradient descent and update the weights according to the weight update rule.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset