Now, we will perform the BPTT, with Adam as our optimizer. We will also perform gradient clipping to avoid the exploding gradients problem:
- Initialize the Adam optimizer:
minimizer = tf.train.AdamOptimizer()
- Compute the gradients of the loss with the Adam optimizer:
gradients = minimizer.compute_gradients(loss)
- Set the threshold for the gradient clipping:
threshold = tf.constant(5.0, name="grad_clipping")
- Clip the gradients that exceed the threshold and bring it to the range:
clipped_gradients = []
for grad, var in gradients:
clipped_grad = tf.clip_by_value(grad, -threshold, threshold)
clipped_gradients.append((clipped_grad, var))
- Update the gradients with the clipped gradients:
updated_gradients = minimizer.apply_gradients(clipped_gradients)