Now that we have a baseline to compare with, let's build our first regression model. We're going to start with a very basic model using only the stock's prior closing values to predict the next day's close, and we're going to build it using a support vector regression. With that, let's set up our model:
- The first step is to set up a DataFrame that contains a price history for each day. We're going to include the past 20 closes in our model:
for i in range(1, 21, 1): sp.loc[:,'Close Minus ' + str(i)] = sp['Close'].shift(i) sp20 = sp[[x for x in sp.columns if 'Close Minus' in x or x == 'Close']].iloc[20:,] sp20
- This code gives us each day's closing price, along with the previous 20, all on the same line. The result of our code is seen in the following output:
- This will form the basis of the X array we will feed our model. But before we're ready for that, there are a few additional steps.
- First, we'll reverse our columns so that time runs from left to right:
sp20 = sp20.iloc[:,::-1] sp20
This generates the following output:
- Now, let's import our support vector machine and set our our training and test matrices and vectors:
from sklearn.svm import SVR clf = SVR(kernel='linear') X_train = sp20[:-2000] y_train = sp20['Close'].shift(-1)[:-2000] X_test = sp20[-2000:] y_test = sp20['Close'].shift(-1)[-2000:]
- We had just 5,000 data points to work with, so I chose to use the last 2,000 for testing. Let's now fit our model and use it to check out-of-sample data:
model = clf.fit(X_train, y_train) preds = model.predict(X_test)
- Now that we have our predictions, let's compare them to our actual data:
tf = pd.DataFrame(list(zip(y_test, preds)), columns=['Next Day Close', 'Predicted Next Close'], index=y_test.index) tf
The preceding code generates the following output: