The data we will summarize will be for a whole business week from Monday to Friday. During the period covered by the data, there was one holiday on February 21st, President's Day. This happened to be a Monday and the US stock exchanges were closed on this day. As a consequence, there is no entry for this day, in the sample. The first day in the sample is a Friday, which is inconvenient. Use the following instructions to summarize data:
close = close[:16] dates = dates[:16]
We will build on the code from the Time for action – dealing with dates tutorial.
where
function. Then, we will need to extract the first element that has index 0. The result would be a multidimensional array. Flatten that with the ravel
function.# get first Monday first_monday = np.ravel(np.where(dates == 0))[0] print "The first Monday index is", first_monday
This will print the following output:
The first Monday index is 1
4
. Additionally, we are looking for the second-to-last element with index 2
.# get last Friday last_friday = np.ravel(np.where(dates == 4))[-2] print "The last Friday index is", last_friday
This will give us the following output:
The last Friday index is 15
Next, create an array with the indices of all the days in the three weeks:
weeks_indices = np.arange(first_monday, last_friday + 1) print "Weeks indices initial", weeks_indices
5
with the split
function.weeks_indices = np.split(weeks_indices, 5) print "Weeks indices after split", weeks_indices
It splits the array, as follows:
Weeks indices after split [array([1, 2, 3, 4, 5]), array([ 6, 7, 8, 9, 10]), array([11, 12, 13, 14, 15])]
apply_along_axis
function. This function calls another function, which we will provide, to operate on each of the elements of an array. Currently, we have an array with three elements. Each array item corresponds to one week in our sample and contains indices of the corresponding items. Call the apply_along_axis
function by supplying the name of our function, called summarize
, that we will define shortly. Further specify the axis or dimension number (such as 1
), the array to operate on, and a variable number of arguments for the summarize
function, if any.weeksummary = np.apply_along_axis(summarize, 1, weeks_indices, open, high, low, close) print "Week summary", weeksummary
summarize
function. The summarize
function returns, for each week, a tuple that holds the open, high, low, and close prices for the week, similarly to end-of-day data.def summarize(a, o, h, l, c): monday_open = o[a[0]] week_high = np.max( np.take(h, a) ) week_low = np.min( np.take(l, a) ) friday_close = c[a[-1]] return("APPL", monday_open, week_high, week_low, friday_close)
Notice that we used the take
function to get the actual values from indices. Calculating the high and low values of the week was easily done with the max
and min
functions. open
for the week is the open for the first day in the week—Monday. Likewise, close
is the close for the last day of the week—Friday.
Week summary [['APPL' '335.8' '346.7' '334.3' '346.5'] ['APPL' '347.89' '360.0' '347.64' '356.85'] ['APPL' '356.79' '364.9' '349.52' '350.56']]
savetxt
function.np.savetxt("weeksummary.csv", weeksummary, delimiter=",", fmt="%s")
As you can see, we specify a filename, the array we want to store, a delimiter (in this case a comma), and the format we want to store floating point numbers in.
The format string starts with a percent sign. Second is an optional flag. The -
flag means left justify, 0
means left pad with zeroes, +
means precede with +
or -
. Third is an optional width. The width indicates the minimum number of characters. Fourth, a dot is followed by a number linked to precision. Finally, there comes a character specifier; in our example, the character specifier is a string.
Character code |
Description |
---|---|
|
character |
|
signed decimal integer |
|
scientific notation with |
|
decimal floating point |
|
use the shorter of |
|
signed octal |
|
string of characters |
|
unsigned decimal integer |
|
unsigned hexadecimal integer |
View the generated file in your favorite editor or type in the following commands in the command line:
cat weeksummary.csv APPL,335.8,346.7,334.3,346.5 APPL,347.89,360.0,347.64,356.85 APPL,356.79,364.9,349.52,350.56
We did something that is not even possible in some programming languages. We defined a function and passed it as an argument to the apply_along_axis
function. Arguments for the summarize
function were neatly passed by apply_along_axis
(see weeksummary.py
).
import numpy as np from datetime import datetime # Monday 0 # Tuesday 1 # Wednesday 2 # Thursday 3 # Friday 4 # Saturday 5 # Sunday 6 def datestr2num(s): return datetime.strptime(s, "%d-%m-%Y").date().weekday() dates, open, high, low, close=np.loadtxt('data.csv', delimiter=',', usecols=(1, 3, 4, 5, 6), converters={1: datestr2num}, unpack=True) close = close[:16] dates = dates[:16] # get first Monday first_monday = np.ravel(np.where(dates == 0))[0] print "The first Monday index is", first_monday # get last Friday last_friday = np.ravel(np.where(dates == 4))[-1] print "The last Friday index is", last_friday weeks_indices = np.arange(first_monday, last_friday + 1) print "Weeks indices initial", weeks_indices weeks_indices = np.split(weeks_indices, 3) print "Weeks indices after split", weeks_indices def summarize(a, o, h, l, c): monday_open = o[a[0]] week_high = np.max( np.take(h, a) ) week_low = np.min( np.take(l, a) ) friday_close = c[a[-1]] return("APPL", monday_open, week_high, week_low, friday_close) weeksummary = np.apply_along_axis(summarize, 1, weeks_indices, open, high, low, close) print "Week summary", weeksummary np.savetxt("weeksummary.csv", weeksummary, delimiter=",", fmt="%s")