The scikits-statsmodels package has lots of statistical tests. We will see an example of such a test—the Anderson-Darling test for normality (http://en.wikipedia.org/wiki/Anderson%E2%80%93Darling_test).
We will download price data as in the previous recipe; but this time for a single stock. Again, we will calculate the log returns of the close price of this stock, and use that as an input for the normality test function.
This function returns a tuple containing a second element—a p-value between zero and one. The complete code for this tutorial is as follows:
import datetime import numpy from matplotlib import finance from statsmodels.stats.adnorm import normal_ad import sys #1. Download price data # 2011 to 2012 start = datetime.datetime(2011, 01, 01) end = datetime.datetime(2012, 01, 01) print "Retrieving data for", sys.argv[1] quotes = finance.quotes_historical_yahoo(sys.argv[1], start, end, asobject=True) close = numpy.array(quotes.close).astype(numpy.float) print close.shape print normal_ad(numpy.diff(numpy.log(close)))
The following shows the output of the script with p-value of 0.13:
Retrieving data for AAPL (252,) (0.57103805516803163, 0.13725944999430437)
This recipe demonstrated the Anderson Darling statistical test for normality, as found in scikits-statsmodels. We used the stock price data, which does not have a normal distribution, as input. For the data, we got a p-value of 0.13. Since probabilities range between zero and one, this confirms our hypothesis.