Next, however, I want to introduce another model, which uses a completely different algorithm. This algorithm is called dynamic time warping. What it does is give you a metric that represents the similarity between two time series:
- To get started, we'll need to pip install the fastdtw library:
!pip install fastdtw
- Once that is installed, we'll import the additional libraries we'll need:
from scipy.spatial.distance import euclidean from fastdtw import fastdtw
- Next, we'll create the function that will take in two series and return the distance between them:
def dtw_dist(x, y): distance, path = fastdtw(x, y, dist=euclidean) return distance
- Now, we'll split our 18 years' worth of time series data into distinct five-day periods. We'll pair together each period with one additional point. This will serve to create our x and y data, as follows:
tseries = [] tlen = 5 for i in range(tlen, len(sp), tlen): pctc = sp['Close'].iloc[i-tlen:i].pct_change()[1:].values * 100 res = sp['Close'].iloc[i-tlen:i+1].pct_change()[-1] * 100 tseries.append((pctc, res))
- We can take a look at our first series to get an idea of what the data looks like:
tseries[0]
This generates the following output:
- Now that we have each series, we can run them all through our algorithm to get the distance metric for each series against every other series:
dist_pairs = [] for i in range(len(tseries)): for j in range(len(tseries)): dist = dtw_dist(tseries[i][0], tseries[j][0]) dist_pairs.append((i,j,dist,tseries[i][1], tseries[j][1]))
Once we have that, we can place it into a DataFrame. We'll drop series that have 0 distance, as they represent the same series. We'll also sort according to the date of the series and look only at those where the first series is before the second, chronologically speaking:
dist_frame = pd.DataFrame(dist_pairs, columns=['A','B','Dist', 'A Ret', 'B Ret']) sf = dist_frame[dist_frame['Dist']>0].sort_values(['A','B']).reset_index(drop=1) sfe = sf[sf['A']<sf['B']]
And finally, we'll limit our trades where the distance is less than 1 and the first series has a positive return:
winf = sfe[(sfe['Dist']<=1)&(sfe['A Ret']>0)] winf
This generates the following output:
Let's see what one of our top patterns (A:6 and B:598) looks like when plotted:
plt.plot(np.arange(4), tseries[6][0]);
The preceding code generates the following output:
Now, we'll plot the second one:
plt.plot(np.arange(4), tseries[598][0])
The preceding code generates the following output:
As you can see, the curves are nearly identical, which is exactly what we want. We're going to try to find all curves that have positive next-day gains and then, once we have a curve that is highly similar to one of these profitable curves, we'll buy it in anticipation of another gain.