Gradient descent algorithm

A gradient descent algorithm is an iterative algorithm updating the variables in the model to fit the data with the least error. More generally, it finds a minimum of a function.

We would like to express the weight in terms of the height using a linear formula:

weight(height,p)=p₁*height+p₀

We estimate the parameter p=(p₀,p₁) using n data samples (height_i,weight_i) to minimize the following square error:

The gradient descent algorithm does it by updating the parameter p_i in the direction of (∂/∂ p_j) E(p), in particular:

Here, learning_rate determines the speed of the convergence of the E(p) to the minimum. Updating of the parameter p will result in the convergence of E(p) to a certain value providing that learning_rate is sufficiently small. In the Python program, we use learning_rate of 0.000001. However, the drawback of this update rule is that the minimum of E(p) may be only a local minimum.

To update the parameter p programatically, we need to unfold the partial derivative on E(p). Therefore, we update the parameter p as follows:

We will keep updating the parameter p until it changes only a very little, that is, the change of both p₀ and p₁ is less than some constant acceptable_error. Once the parameter p stabilizes, we can use it to estimate the weight from the height.

Implementation:

# source_code/6/regression.py
# Linear regression program to learn a basic linear model.
import math
import sys
sys.path.append('../common')
import common # noqa

# Calculate the gradient by which the parameter should be updated.
def linear_gradient(data, old_parameter):
    gradient = [0.0, 0.0]
    for (x, y) in data:
        term = float(y) - old_parameter[0] - old_parameter[1] * float(x)
        gradient[0] += term
        gradient[1] += term * float(x)
    return gradient

# This function will apply gradient descent algorithm
# to learn the linear model.
def learn_linear_parameter(data, learning_rate,
                           acceptable_error, LIMIT):
    parameter = [1.0, 1.0]
    old_parameter = [1.0, 1.0]
    for i in range(0, LIMIT):
        gradient = linear_gradient(data, old_parameter)
        # Update the parameter with the Least Mean Squares rule.
        parameter[0] = old_parameter[0] + learning_rate * gradient[0]
        parameter[1] = old_parameter[1] + learning_rate * gradient[1]
        # Calculate the error between the two parameters to compare with
        # the permissible error in order to determine if the calculation
        # is suffiently accurate.
        if abs(parameter[0] - old_parameter[0]) <= acceptable_error
        and abs(parameter[1] - old_parameter[1]) <= acceptable_error:
            return parameter
        old_parameter[0] = parameter[0]
        old_parameter[1] = parameter[1]
    return parameter

# Calculate the y coordinate based on the linear model predicted.
def predict_unknown(data, linear_parameter):
    for (x, y) in data:
        print(x, linear_parameter[0] + linear_parameter[1] * float(x))

# Program start
csv_file_name = sys.argv[1]
# The maximum number of the iterations in the batch learning algorithm.
LIMIT = 100
# Suitable parameters chosen for the problem given.
learning_rate = 0.0000001
acceptable_error = 0.001

(heading, complete_data, incomplete_data,
 enquired_column) = common.csv_file_to_ordered_data(csv_file_name)
linear_parameter = learn_linear_parameter(
    complete_data, learning_rate, acceptable_error, LIMIT)
print("Linear model:
(p0,p1)=" + str(linear_parameter) + "
")
print("Unknowns based on the linear model:")
predict_unknown(incomplete_data, linear_parameter)

Input:

We use the data from the table in example weight prediction from height and save it in a CSV file.

# source_code/6/height_weight.csv
height,weight
180,75 
174,71 
184,83 
168,63 
178,70 
172,?

Output:

$ python regression.py height_weight.csv
Linear model:
(p0,p1)=[0.9966468959362077, 0.4096393414704317]

Unknowns based on the linear model:
('172', 71.45461362885045)

The output for the linear model means that the weight can be expressed in terms of the height as follows:

weight = 0.4096393414704317 * height + 0.9966468959362077

Therefore, a man with a height of 172cm is predicted to weigh 0.4096393414704317 * 172 + 0.9966468959362077 = 71.45461362885045 ~ 71.455kg.

Note that this prediction of 71.455kg is slightly different from the prediction in R of 67.016kg. This may be due to the fact that the Python algorithm found only a local minimum in the prediction or that R uses a different algorithm or its implementation.

Table of Contents for Gradient descent algorithm

Create new playlist

Sign In

Sign Up

Table of Contents for
Gradient descent algorithm