Overcoming the limitations of Adagrad using RMSProp

Similar to Adadelta, RMSProp was introduced to combat the decaying learning rate problem of Adagrad. So, in RMSProp, we compute the exponentially decaying running average of gradients as follows:

Instead of taking the sum of the square of all the past gradients, we use this running average of gradients. This means that our update equation becomes the following:

It is recommended to assign a value of learning to 0.9. Now, we will learn how to implement RMSProp in Python.

First, we need to define the RMSProp function:

def RMSProp(data, theta, lr = 1e-2, gamma = 0.9, epsilon = 1e-6, num_iterations = 1000):

Now, we need to initialize the E_grad2 variable with zeros to store the running average of gradients:

    E_grad2 = np.zeros(theta.shape[0])

For every iteration, we perform the following steps:

    for t in range(num_iterations):

Then, we compute the gradients with respect to theta:

        gradients = compute_gradients(data, theta) 

Next, we compute the running average of the gradients, that is, :

        E_grad2 = (gamma * E_grad2) + ((1. - gamma) * (gradients ** 2))

Now, we update the parameter of the model, theta, so that it's :

        theta = theta - (lr / (np.sqrt(E_grad2 + epsilon)) * gradients)
return theta
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset