Ignoring negative and extreme values

Masked arrays are useful when we want to ignore negative values, for instance, when taking the logarithm of array values. Another use case for masked arrays is excluding extreme values. This works based on an upper and lower bound for extreme values.

In this tutorial, we will apply these techniques to stock price data. We will skip the steps for downloading data, as they are repeated in previous chapters.

How to do it...

We will take the logarithm of an array that contains negative numbers.

  1. Take the logarithm of negative numbers.

    First, let's create an array containing numbers divisible by three:

    triples = numpy.arange(0, len(close), 3)
    print "Triples", triples[:10], "..."

    Next, we will create an array with the ones that have the same size as the price data array:

    signs = numpy.ones(len(close))
    print "Signs", signs[:10], "..."

    We will set each third number to be negative, with the help of indexing tricks we learned about in Chapter 2, Advanced Indexing and Array Concepts.

    signs[triples] = -1
    print "Signs", signs[:10], "..."

    Finally, we will take the logarithm of this array:

    ma_log = numpy.ma.log(close * signs)
    print "Masked logs", ma_log[:10], "..."

    This should print the following output for AAPL:

    Triples [ 0  3  6  9 12 15 18 21 24 27] ...
    Signs [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.] ...
    Signs [-1.  1.  1. -1.  1.  1. -1.  1.  1. -1.] ...
    Masked logs [-- 5.93655586575 5.95094223368 -- 5.97468290742 5.97510711452 --
     6.01674381162 5.97889061623 --] ...
    
  2. Ignoring extreme values.

    Let's define extreme values as being below one standard deviation of the mean, or one standard deviation above the mean. This definition leads us to write the following code, which will mask extreme values:

    dev = close.std()
    avg = close.mean()
    inside = numpy.ma.masked_outside(close, avg - dev, avg + dev)
    print "Inside", inside[:10], "..."

    This code prints for the first ten elements:

    Inside [-- -- -- -- -- -- 409.429675172 410.240597855 -- --] ...

    Let's plot the original price data, the data after taking the logarithm, and the exponent back again, and finally the data after applying the standard deviation based mask. The result will be as shown in the following screenshot:

    How to do it...

The complete program for this tutorial is as follows:

import numpy
from matplotlib.finance 
import quotes_historical_yahoo
from datetime import date
import sys
import matplotlib.pyplot

def get_close(ticker):
    today = date.today()
    start = (today.year - 1, today.month, today.day)

    quotes = quotes_historical_yahoo(ticker, start, today)

    return numpy.array([q[4] for q in quotes])

close = get_close(sys.argv[1])

triples = numpy.arange(0, len(close), 3)
print "Triples", triples[:10], "..."

signs = numpy.ones(len(close))
print "Signs", signs[:10], "..."

signs[triples] = -1
print "Signs", signs[:10], "..."

ma_log = numpy.ma.log(close * signs)
print "Masked logs", ma_log[:10], "..."

dev = close.std()
avg = close.mean()
inside = numpy.ma.masked_outside(close, avg - dev, avg + dev)
print "Inside", inside[:10], "..."

matplotlib.pyplot.subplot(311)
matplotlib.pyplot.title("Original")
matplotlib.pyplot.plot(close)

matplotlib.pyplot.subplot(312)
matplotlib.pyplot.title("Log Masked")
matplotlib.pyplot.plot(numpy.exp(ma_log))

matplotlib.pyplot.subplot(313)
matplotlib.pyplot.title("Not Extreme")
matplotlib.pyplot.plot(inside)

matplotlib.pyplot.show()

How it works...

Functions in the numpy.ma module mask array elements, which we regard as illegal. For instance, negative values are not allowed for the log and sqrt functions. A masked value is like a NULL value in databases and programming. All operations with a masked value result in a masked value.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset