A Pandas DataFrame
is a matrix and dictionary-like data structure similar to the functionality available in R. In fact, it is the central data structure in Pandas and you can apply all kinds of operations on it. It is quite common to have a look, for instance, at the correlation matrix of a portfolio. So let's do that.
First, we will create the
DataFrame
with Pandas for each symbol's daily log returns. Then we will join these on the date. At the end, the correlation will be printed, and plot will be shown.
To create the data frame, we will create a dictionary containing stock symbols as keys, and the corresponding log returns as values. The data frame itself has the date as index and the stock symbols as column labels:
data = {} for i in xrange(len(symbols)): data[symbols[i]] = numpy.diff(numpy.log(close[i])) df = pandas.DataFrame(data, index=dates[0][:-1], columns=symbols)
We can now perform operations, such as calculating a correlation matrix or plotting. on the data frame:
print df.corr() df.plot()
The complete source code that also downloads the price data is as follows:
import pandas from matplotlib.pyplot import show, legend from datetime import datetime from matplotlib import finance import numpy # 2011 to 2012 start = datetime(2011, 01, 01) end = datetime(2012, 01, 01) symbols = ["AA", "AXP", "BA", "BAC", "CAT"] quotes = [finance.quotes_historical_yahoo(symbol, start, end, asobject=True) for symbol in symbols] close = numpy.array([q.close for q in quotes]).astype(numpy.float) dates = numpy.array([q.date for q in quotes]) data = {} for i in xrange(len(symbols)): data[symbols[i]] = numpy.diff(numpy.log(close[i])) df = pandas.DataFrame(data, index=dates[0][:-1], columns=symbols) print df.corr() df.plot() legend(symbols) show()
Output for the correlation matrix:
AA AXP BA BAC CAT AA 1.000000 0.768484 0.758264 0.737625 0.837643 AXP 0.768484 1.000000 0.746898 0.760043 0.736337 BA 0.758264 0.746898 1.000000 0.657075 0.770696 BAC 0.737625 0.760043 0.657075 1.000000 0.657113 CAT 0.837643 0.736337 0.770696 0.657113 1.000000
The following image shows the plot for the log returns of the five stocks:
We used the following DataFrame
methods:
Method |
Description |
---|---|
Constructs | |
Computes pair-wise correlation of columns, ignoring the missing values. By default, Pearson correlation is used. | |
Plots the data frame with Matplotlib. |