Gradient descent algorithm

A gradient descent algorithm is an iterative algorithm updating the variables in the model to fit the data with the least error. More generally, it finds a minimum of a function.

We would like to express the weight in terms of the height using a linear formula:

weight(height,p)=p1*height+p0

We estimate the parameter p=(p0,p1) using n data samples (heighti,weighti) to minimize the following square error:


The gradient descent algorithm does it by updating the parameter pi in the direction of (∂/∂ pj) E(p), in particular:

Here, learning_rate determines the speed of the convergence of the E(p) to the minimum. Updating of the parameter p will result in the convergence of E(p) to a certain value providing that learning_rate is sufficiently small. In the Python program, we use learning_rate of 0.000001. However, the drawback of this update rule is that the minimum of E(p) may be only a local minimum.

To update the parameter p programatically, we need to unfold the partial derivative on E(p). Therefore, we update the parameter p as follows:

We will keep updating the parameter p until it changes only a very little, that is, the change of both p0 and p1 is less than some constant acceptable_error. Once the parameter p stabilizes, we can use it to estimate the weight from the height.

Implementation:

# source_code/6/regression.py
# Linear regression program to learn a basic linear model.
import math
import sys
sys.path.append('../common')
import common # noqa

# Calculate the gradient by which the parameter should be updated.
def linear_gradient(data, old_parameter):
gradient = [0.0, 0.0]
for (x, y) in data:
term = float(y) - old_parameter[0] - old_parameter[1] * float(x)
gradient[0] += term
gradient[1] += term * float(x)
return gradient

# This function will apply gradient descent algorithm
# to learn the linear model.
def learn_linear_parameter(data, learning_rate,
acceptable_error, LIMIT):
parameter = [1.0, 1.0]
old_parameter = [1.0, 1.0]
for i in range(0, LIMIT):
gradient = linear_gradient(data, old_parameter)
# Update the parameter with the Least Mean Squares rule.
parameter[0] = old_parameter[0] + learning_rate * gradient[0]
parameter[1] = old_parameter[1] + learning_rate * gradient[1]
# Calculate the error between the two parameters to compare with
# the permissible error in order to determine if the calculation
# is suffiently accurate.
if abs(parameter[0] - old_parameter[0]) <= acceptable_error
and abs(parameter[1] - old_parameter[1]) <= acceptable_error:
return parameter
old_parameter[0] = parameter[0]
old_parameter[1] = parameter[1]
return parameter

# Calculate the y coordinate based on the linear model predicted.
def predict_unknown(data, linear_parameter):
for (x, y) in data:
print(x, linear_parameter[0] + linear_parameter[1] * float(x))

# Program start
csv_file_name = sys.argv[1]
# The maximum number of the iterations in the batch learning algorithm.
LIMIT = 100
# Suitable parameters chosen for the problem given.
learning_rate = 0.0000001
acceptable_error = 0.001

(heading, complete_data, incomplete_data,
enquired_column) = common.csv_file_to_ordered_data(csv_file_name)
linear_parameter = learn_linear_parameter(
complete_data, learning_rate, acceptable_error, LIMIT)
print("Linear model: (p0,p1)=" + str(linear_parameter) + " ")
print("Unknowns based on the linear model:")
predict_unknown(incomplete_data, linear_parameter)

Input:

We use the data from the table in example weight prediction from height and save it in a CSV file.

# source_code/6/height_weight.csv
height,weight
180,75 174,71 184,83 168,63 178,70 172,?

Output:

$ python regression.py height_weight.csv
Linear model: (p0,p1)=[0.9966468959362077, 0.4096393414704317]
Unknowns based on the linear model: ('172', 71.45461362885045)

The output for the linear model means that the weight can be expressed in terms of the height as follows:

weight = 0.4096393414704317 * height + 0.9966468959362077

Therefore, a man with a height of 172cm is predicted to weigh 0.4096393414704317 * 172 + 0.9966468959362077 = 71.45461362885045 ~ 71.455kg.

Note that this prediction of 71.455kg is slightly different from the prediction in R of 67.016kg. This may be due to the fact that the Python algorithm found only a local minimum in the prediction or that R uses a different algorithm or its implementation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset