Masked arrays are useful when we want to ignore negative values, for instance, when taking the logarithm of array values. Another use case for masked arrays is excluding extreme values. This works based on an upper and lower bound for extreme values.
In this tutorial, we will apply these techniques to stock price data. We will skip the steps for downloading data, as they are repeated in previous chapters.
We will take the logarithm of an array that contains negative numbers.
First, let's create an array containing numbers divisible by three:
triples = numpy.arange(0, len(close), 3) print "Triples", triples[:10], "..."
Next, we will create an array with the ones that have the same size as the price data array:
signs = numpy.ones(len(close)) print "Signs", signs[:10], "..."
We will set each third number to be negative, with the help of indexing tricks we learned about in Chapter 2, Advanced Indexing and Array Concepts.
signs[triples] = -1 print "Signs", signs[:10], "..."
Finally, we will take the logarithm of this array:
ma_log = numpy.ma.log(close * signs) print "Masked logs", ma_log[:10], "..."
This should print the following output for AAPL:
Triples [ 0 3 6 9 12 15 18 21 24 27] ... Signs [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] ... Signs [-1. 1. 1. -1. 1. 1. -1. 1. 1. -1.] ... Masked logs [-- 5.93655586575 5.95094223368 -- 5.97468290742 5.97510711452 -- 6.01674381162 5.97889061623 --] ...
Let's define extreme values as being below one standard deviation of the mean, or one standard deviation above the mean. This definition leads us to write the following code, which will mask extreme values:
dev = close.std() avg = close.mean() inside = numpy.ma.masked_outside(close, avg - dev, avg + dev) print "Inside", inside[:10], "..."
This code prints for the first ten elements:
Inside [-- -- -- -- -- -- 409.429675172 410.240597855 -- --] ...
Let's plot the original price data, the data after taking the logarithm, and the exponent back again, and finally the data after applying the standard deviation based mask. The result will be as shown in the following screenshot:
The complete program for this tutorial is as follows:
import numpy from matplotlib.finance import quotes_historical_yahoo from datetime import date import sys import matplotlib.pyplot def get_close(ticker): today = date.today() start = (today.year - 1, today.month, today.day) quotes = quotes_historical_yahoo(ticker, start, today) return numpy.array([q[4] for q in quotes]) close = get_close(sys.argv[1]) triples = numpy.arange(0, len(close), 3) print "Triples", triples[:10], "..." signs = numpy.ones(len(close)) print "Signs", signs[:10], "..." signs[triples] = -1 print "Signs", signs[:10], "..." ma_log = numpy.ma.log(close * signs) print "Masked logs", ma_log[:10], "..." dev = close.std() avg = close.mean() inside = numpy.ma.masked_outside(close, avg - dev, avg + dev) print "Inside", inside[:10], "..." matplotlib.pyplot.subplot(311) matplotlib.pyplot.title("Original") matplotlib.pyplot.plot(close) matplotlib.pyplot.subplot(312) matplotlib.pyplot.title("Log Masked") matplotlib.pyplot.plot(numpy.exp(ma_log)) matplotlib.pyplot.subplot(313) matplotlib.pyplot.title("Not Extreme") matplotlib.pyplot.plot(inside) matplotlib.pyplot.show()