The NumPy polyfit()
function fits a set of data points to a polynomial, even if the underlying function is not continuous:
bhp=np.loadtxt('BHP.csv', delimiter=',', usecols=(6,), unpack=True) vale=np.loadtxt('VALE.csv', delimiter=',', usecols=(6,), unpack=True) t = np.arange(len(bhp)) poly = np.polyfit(t, bhp - vale, 3) print("Polynomial fit", poly)
The polynomial fit (in this example, a cubic polynomial was chosen) is as follows:
Polynomial fit [ 1.11655581e-03 -5.28581762e-02 5.80684638e-01 5.79791202e+01]
polyval()
function and the polynomial object that we got from the fit:print("Next value", np.polyval(poly, t[-1] + 1))
The next value we predict will be this:
Next value 57.9743076081
roots()
function:print( "Roots", np.roots(poly))
The roots of the polynomial are as follows:
Roots [ 35.48624287+30.62717062j 35.48624287-30.62717062j -23.63210575 +0.j ]
polyder()
function:der = np.polyder(poly) print("Derivative", der)
The coefficients of the derivative polynomial are as follows:
Derivative [ 0.00334967 -0.10571635 0.58068464]
print("Extremas", np.roots(der))
The extremas that we get are as follows:
Extremas [ 24.47820054 7.08205278]
Let's double-check and compute the values of the fit with the polyval()
function:
vals = np.polyval(poly, t)
argmax()
and the argmin()
function:vals = np.polyval(poly, t) print(np.argmax(vals)) print(np.argmin(vals))
This gives us the expected results shown in the following screenshot. OK, not quite the same results, but, if we backtrack to step 1, we can see that t
was defined with the
arange()
function:
7 24
Plot the data and the fit it to get the following plot:
Obviously, the smooth line is the fit and the jagged line is the underlying data. But as it's not that good a fit, you might want to try a higher order polynomial.
We fit data to a polynomial with the polyfit()
function. We learned about the polyval()
function that computes the values of a polynomial, the roots()
function that returns the roots of the polynomial, and the polyder()
function that gives back the derivative of a polynomial (see polynomials.py
):
from __future__ import print_function import numpy as np import sys import matplotlib.pyplot as plt bhp=np.loadtxt('BHP.csv', delimiter=',', usecols=(6,), unpack=True) vale=np.loadtxt('VALE.csv', delimiter=',', usecols=(6,), unpack=True) t = np.arange(len(bhp)) poly = np.polyfit(t, bhp - vale, 3) print("Polynomial fit", poly) print("Next value", np.polyval(poly, t[-1] + 1)) print("Roots", np.roots(poly)) der = np.polyder(poly) print("Derivative", der) print("Extremas", np.roots(der)) vals = np.polyval(poly, t) print(np.argmax(vals)) print(np.argmin(vals)) plt.plot(t, bhp - vale, label='BHP - VALE') plt.plot(t, vals, '-—', label='Fit') plt.title('Polynomial fit') plt.xlabel('Days') plt.ylabel('Difference ($)') plt.grid() plt.legend() plt.show()
You could do a number of things to improve the fit. For example, try a different power as, in this section, a cubic polynomial was chosen. Consider smoothing the data before fitting it. One way you could smooth the data is with a moving average. You can find examples of simple and EMA calculations in the Chapter 3, Getting Familiar with Commonly Used Functions.