First, we will read the close price data. Second, we will split the prices according to the day of the week. Third, for each weekday, we will calculate the average price. Finally, we will find out which day of the week has the highest average and which has the lowest average. A word of warning before we commence: you might be tempted to use the result to buy stock on one day and sell on the other. However, we don't have enough data to make this kind of decisions.
Coders hate dates because they are so complicated! NumPy is very much oriented toward floating point operations. For this reason, we need to take extra effort to process dates. Try it out yourself; put the following code in a script or use the one that comes with this book:
dates, close=np.loadtxt('data.csv', delimiter=',', usecols=(1,6), unpack=True)
Execute the script and the following error will appear:
ValueError: invalid literal for float(): 28-01-2011
Now, perform the following steps to deal with dates:
loadtxt()
function has a special parameter for this purpose. The parameter is called converters
and is a dictionary that links columns with the so-called converter functions. It is our responsibility to write the converter function. Write the function down:# Monday 0 # Tuesday 1 # Wednesday 2 # Thursday 3 # Friday 4 # Saturday 5 # Sunday 6 def datestr2num(s): return datetime.datetime.strptime(s, "%d-%m-%Y").date().weekday()
We give the datestr2num()
function dates as a string, such as 28-01-2011. The string is first turned into a datetime
object, using a specified format %d-%m-%Y
. By the way, this is standard Python and is not related to NumPy itself (see https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior). Second, the
datetime
object is turned into a day. Finally, the weekday method is called on the date to return a number. As you can read in the comments, the number is between 0 and 6. 0 is, for instance, Monday, and 6 is Sunday. The actual number, of course, is not important for our algorithm; it is only used as identification.
dates, close=np.loadtxt('data.csv', delimiter=',', usecols=(1,6), converters={1: datestr2num}, unpack=True) print "Dates =", dates
This prints the following output:
Dates = [ 4. 0. 1. 2. 3. 4. 0. 1. 2. 3. 4. 0. 1. 2. 3. 4. 1. 2. 4. 0. 1. 2. 3. 4. 0. 1. 2. 3. 4.]
No Saturdays and Sundays, as you can see. Exchanges are closed over the weekend.
0
:averages = np.zeros(5)
This array will hold the averages for each weekday.
where()
function that returns indices of the array for elements that conform to a specified condition. The take()
function can use these indices and takes the values of the corresponding array items. We will use the take()
function to get the close prices for each weekday. In the following loop, we go through the date values 0 to 4, better known as Monday to Friday. We get the indices with the where()
function for each day and store it in the indices
array. Then, we retrieve the values corresponding to the indices, using the take()
function. Finally, compute an average for each weekday and store it in the averages
array, like this:for i in range(5): indices = np.where(dates == i) prices = np.take(close, indices) avg = np.mean(prices) print("Day", i, "prices", prices, "Average", avg) averages[i] = avg
The loop prints the following output:
Day 0 prices [[ 339.32 351.88 359.18 353.21 355.36]] Average 351.79 Day 1 prices [[ 345.03 355.2 359.9 338.61 349.31 355.76]] Average 350.635 Day 2 prices [[ 344.32 358.16 363.13 342.62 352.12 352.47]] Average 352.136666667 Day 3 prices [[ 343.44 354.54 358.3 342.88 359.56 346.67]] Average 350.898333333 Day 4 prices [[ 336.1 346.5 356.85 350.56 348.16 360. 351.99]] Average 350.022857143
max()
and min()
functions, as shown here:top = np.max(averages) print("Highest average", top) print("Top day of the week", np.argmax(averages)) bottom = np.min(averages) print("Lowest average", bottom) print("Bottom day of the week", np.argmin(averages))
The output is as follows:
Highest average 352.136666667 Top day of the week 2 Lowest average 350.022857143 Bottom day of the week 4
The argmin()
function returned the index of the lowest value in the averages
array. The index returned was 4
, which corresponds to Friday. The argmax()
function returned the index of the highest value in the averages
array. The index returned was 2
, which corresponds to Wednesday (see weekdays.py
):
from __future__ import print_function import numpy as np from datetime import datetime # Monday 0 # Tuesday 1 # Wednesday 2 # Thursday 3 # Friday 4 # Saturday 5 # Sunday 6 def datestr2num(s): return datetime.strptime(s, "%d-%m-%Y").date().weekday() dates, close=np.loadtxt('data.csv', delimiter=',', usecols=(1,6), converters={1: datestr2num}, unpack=True) print("Dates =", dates) averages = np.zeros(5) for i in range(5): indices = np.where(dates == i) prices = np.take(close, indices) avg = np.mean(prices) print("Day", i, "prices", prices, "Average", avg) averages[i] = avg top = np.max(averages) print("Highest average", top) print("Top day of the week", np.argmax(averages)) bottom = np.min(averages) print("Lowest average", bottom) print("Bottom day of the week", np.argmin(averages))
Hey, that was fun! For the sample data, it appears that Friday is the cheapest day and Wednesday is the day when your Apple stock will be worth the most. Ignoring the fact that we have very little data, is there a better method to compute the averages? Shouldn't we involve volume data as well? Maybe it makes more sense to you to do a time-weighted average. Give it a go! Calculate the VWAP and TWAP. You can find some hints on how to go about doing this at the beginning of this chapter.