- You can read more about gradient descent optimization algorithms and some variants here: https://arxiv.org/pdf/1609.04747.pdf.
- You can find a good article about vanishing gradients and choosing the right activation function here: https://blog.paperspace.com/vanishing-gradients-activation-function/.