Similar to Adadelta, RMSProp was introduced to combat the decaying learning rate problem of Adagrad. So, in RMSProp, we compute the exponentially decaying running average of gradients as follows:
Instead of taking the sum of the square of all the past gradients, we use this running average of gradients. This means that our update equation becomes the following:
It is recommended to assign a value of learning to 0.9. Now, we will learn how to implement RMSProp in Python.
First, we need to define the RMSProp function:
def RMSProp(data, theta, lr = 1e-2, gamma = 0.9, epsilon = 1e-6, num_iterations = 1000):
Now, we need to initialize the E_grad2 variable with zeros to store the running average of gradients:
E_grad2 = np.zeros(theta.shape[0])
For every iteration, we perform the following steps:
for t in range(num_iterations):
Then, we compute the gradients with respect to theta:
gradients = compute_gradients(data, theta)
Next, we compute the running average of the gradients, that is, :
E_grad2 = (gamma * E_grad2) + ((1. - gamma) * (gradients ** 2))
Now, we update the parameter of the model, theta, so that it's :
theta = theta - (lr / (np.sqrt(E_grad2 + epsilon)) * gradients)
return theta