Dynamic time warping

Next, however, I want to introduce another model, which uses a completely different algorithm. This algorithm is called dynamic time warping. What it does is give you a metric that represents the similarity between two time series:

  1. To get started, we'll need to pip install the fastdtw library:
!pip install fastdtw 
  1. Once that is installed, we'll import the additional libraries we'll need:
from scipy.spatial.distance import euclidean 
from fastdtw import fastdtw 
  1. Next, we'll create the function that will take in two series and return the distance between them:
def dtw_dist(x, y): 
    distance, path = fastdtw(x, y, dist=euclidean) 
    return distance 
  1. Now, we'll split our 18 years' worth of time series data into distinct five-day periods. We'll pair together each period with one additional point. This will serve to create our x and y data, as follows:
tseries = [] 
tlen = 5 
for i in range(tlen, len(sp), tlen): 
    pctc = sp['Close'].iloc[i-tlen:i].pct_change()[1:].values * 100 
    res = sp['Close'].iloc[i-tlen:i+1].pct_change()[-1] * 100 
    tseries.append((pctc, res)) 
  1. We can take a look at our first series to get an idea of what the data looks like:

This generates the following output:

  1. Now that we have each series, we can run them all through our algorithm to get the distance metric for each series against every other series:
dist_pairs = [] 
for i in range(len(tseries)): 
    for j in range(len(tseries)): 
        dist = dtw_dist(tseries[i][0], tseries[j][0]) 
        dist_pairs.append((i,j,dist,tseries[i][1], tseries[j][1])) 

Once we have that, we can place it into a DataFrame. We'll drop series that have 0 distance, as they represent the same series. We'll also sort according to the date of the series and look only at those where the first series is before the second, chronologically speaking:

dist_frame = pd.DataFrame(dist_pairs, columns=['A','B','Dist', 'A Ret', 'B Ret']) 
sf = dist_frame[dist_frame['Dist']>0].sort_values(['A','B']).reset_index(drop=1) 
sfe = sf[sf['A']<sf['B']] 

And finally, we'll limit our trades where the distance is less than 1 and the first series has a positive return:

winf = sfe[(sfe['Dist']<=1)&(sfe['A Ret']>0)] 

This generates the following output:

Let's see what one of our top patterns (A:6 and B:598) looks like when plotted:

plt.plot(np.arange(4), tseries[6][0]); 

The preceding code generates the following output:

Now, we'll plot the second one:

plt.plot(np.arange(4), tseries[598][0]) 

The preceding code generates the following output:

As you can see, the curves are nearly identical, which is exactly what we want. We're going to try to find all curves that have positive next-day gains and then, once we have a curve that is highly similar to one of these profitable curves, we'll buy it in anticipation of another gain.

