Performance of the model

Let's now look at the performance of our model. We're going to buy the next day's open if the close is predicted to be higher than the open. We'll then sell at the close that same day. We'll need to add a few extra data points to our DataFrame to calculate our results, as follows:

cdc = sp[['Close']].iloc[-1000:] 
ndo = sp[['Open']].iloc[-1000:].shift(-1) 
 
tf1 = pd.merge(tf, cdc, left_index=True, right_index=True) 
tf2 = pd.merge(tf1, ndo, left_index=True, right_index=True) 
tf2.columns = ['Next Day Close', 'Predicted Next Close', 'Current Day Close', 'Next Day Open'] 
 
tf2 

This generates the following output:

Here we'll add the following code to get our signal and our profit and loss for the signal:

def get_signal(r): 
    if r['Predicted Next Close'] > r['Next Day Open']: 
        return 1 
    else: 
        return 0 
 
def get_ret(r): 
    if r['Signal'] == 1: 
        return ((r['Next Day Close'] - r['Next Day Open'])/r['Next Day Open']) * 100 
    else: 
        return 0 
 
tf2 = tf2.assign(Signal = tf2.apply(get_signal, axis=1)) 
tf2 = tf2.assign(PnL = tf2.apply(get_ret, axis=1)) 
 
tf2 

This generates the following output:

Let's now see whether, using just the price history, we were able to successfully predict the next day's price. We'll start by calculating the points gained:

(tf2[tf2['Signal']==1]['Next Day Close'] - tf2[tf2['Signal']==1]['Next Day Open']).sum() 

This generates the following output:

Ouch! This looks bad. But what about the period we tested? We never evaluated it in isolation. How many points would our basic intraday strategy have generated during the last 2,000 days:

(sp['Close'].iloc[-2000:] - sp['Open'].iloc[-2000:]).sum() 

This generates the following output:

So it looks as if our strategy is abysmal. Let's compare the two.

First, the basic intraday strategy for the period:

get_stats((sp['Close'].iloc[-2000:] - sp['Open'].iloc[-2000:])/sp['Open'].iloc[-2000:] * 100) 

This generates the following output:

And now the results for our model:

get_stats(tf2['PnL']) 

This generates the following output:

It's clear our strategy is not one we would want to implement. How might we improve what we have here? What if we modified our trading strategy? What if we only took trades that, instead of just being any amount greater than the open, were expected to be greater by a point or more. Would that help? Let's try it. We'll re-run our strategy with a modified signal, as demonstrated in the following code block:

def get_signal(r): 
    if r['Predicted Next Close'] > r['Next Day Open'] + 1: 
        return 1 
    else: 
        return 0 
 
def get_ret(r): 
    if r['Signal'] == 1: 
        return ((r['Next Day Close'] - r['Next Day Open'])/r['Next Day Open']) * 100 
    else: 
        return 0 
 
tf2 = tf2.assign(Signal = tf2.apply(get_signal, axis=1)) 
tf2 = tf2.assign(PnL = tf2.apply(get_ret, axis=1)) 
 
(tf2[tf2['Signal']==1]['Next Day Close'] - tf2[tf2['Signal']==1]['Next Day Open']).sum() 

This generates the following output:

And now the stats:

get_stats(tf2['PnL']) 

This generates the following output:

We have gone from bad to worse. It appears that, if past price history suggests good things to come, you can expect precisely the opposite. We seem to have developed a contrarian indicator with our model. What if we explore that? Let's see what our gains would look like if we flipped our model so that, when we predict strong gains, we don't trade, but otherwise we do:

def get_signal(r): 
    if r['Predicted Next Close'] > r['Next Day Open'] + 1: 
        return 0 
    else: 
        return 1 
 
def get_ret(r): 
    if r['Signal'] == 1: 
        return ((r['Next Day Close'] - r['Next Day Open'])/r['Next Day Open']) * 100 
    else: 
        return 0 
 
tf2 = tf2.assign(Signal = tf2.apply(get_signal, axis=1)) 
tf2 = tf2.assign(PnL = tf2.apply(get_ret, axis=1)) 
 
(tf2[tf2['Signal']==1]['Next Day Close'] - tf2[tf2['Signal']==1]['Next Day Open']).sum() 

This generates the following output:

Let's get our stats:

get_stats(tf2['PnL']) 

This generates the following output:

It looks like we do have a contrarian indicator here. When our model predicts strong next-day gains, the market significantly underperforms, at least during our test period. Would this hold in most scenarios? Unlikely. Markets tend to flip from regimes of mean reversion to regimes of trend persistence.

At this point, there are a number of extensions we could make to this model. We haven't even touched on using technical indicators or fundamental data in our model, and we have limited our trades to one day. All of this could be tweaked and extended upon, but there is one important point we have not addressed that must be mentioned.

The data we are working with is of a special type called time series data. Time series data requires special treatment to properly model it, as it typically violates the assumptions required for statistical modeling, such as a constant mean and variance.

One consequence of improperly handling time series data is that error metrics give wildly inaccurate measures. Because of significant autocorrelation, in other words, the data in the next period is highly correlated with data in the current period, it appears that we have achieved much better predictions than we actually have.

To address these issues, time series data is often differenced (in the case of stock data, this would mean we look at the daily change, not the absolute level of the index) to render it as what we call stationary; that is, it has a constant mean and variance and lacks significant autocorrelation.

If you intend to pursue working with time series data, I implore you to research these concepts in more detail.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset