Estimating stock returns correlation with Pandas

A Pandas DataFrame is a matrix and dictionary-like data structure similar to the functionality available in R. In fact, it is the central data structure in Pandas and you can apply all kinds of operations on it. It is quite common to have a look, for instance, at the correlation matrix of a portfolio. So let's do that.

How to do it...

First, we will create the DataFrame with Pandas for each symbol's daily log returns. Then we will join these on the date. At the end, the correlation will be printed, and plot will be shown.

  1. Creating the data frame.

    To create the data frame, we will create a dictionary containing stock symbols as keys, and the corresponding log returns as values. The data frame itself has the date as index and the stock symbols as column labels:

    data = {}
    
    for i in xrange(len(symbols)):
      data[symbols[i]] = numpy.diff(numpy.log(close[i]))
    
    df = pandas.DataFrame(data, index=dates[0][:-1], columns=symbols)
  2. Operating on the data frame.

    We can now perform operations, such as calculating a correlation matrix or plotting. on the data frame:

    print df.corr()
    df.plot()

The complete source code that also downloads the price data is as follows:

import pandas
from matplotlib.pyplot import show, legend
from datetime import datetime
from matplotlib import finance
import numpy

# 2011 to 2012
start = datetime(2011, 01, 01)
end = datetime(2012, 01, 01)

symbols = ["AA", "AXP", "BA", "BAC", "CAT"]

quotes = [finance.quotes_historical_yahoo(symbol, start, end, asobject=True)
    for symbol in symbols]

close = numpy.array([q.close for q in quotes]).astype(numpy.float)
dates = numpy.array([q.date for q in quotes])

data = {}

for i in xrange(len(symbols)):
    data[symbols[i]] = numpy.diff(numpy.log(close[i]))

df = pandas.DataFrame(data, index=dates[0][:-1], columns=symbols)
 
 
print df.corr()
df.plot()
legend(symbols)
show()

Output for the correlation matrix:

           AA       AXP        BA       BAC       CAT
AA   1.000000  0.768484  0.758264  0.737625  0.837643
AXP  0.768484  1.000000  0.746898  0.760043  0.736337
BA   0.758264  0.746898  1.000000  0.657075  0.770696
BAC  0.737625  0.760043  0.657075  1.000000  0.657113
CAT  0.837643  0.736337  0.770696  0.657113  1.000000

The following image shows the plot for the log returns of the five stocks:

How to do it...

How it works...

We used the following DataFrame methods:

Method

Description

pandas.DataFrame

Constructs DataFrame with specified data, index (row), and column labels.

pandas.DataFrame.corr

Computes pair-wise correlation of columns, ignoring the missing values. By default, Pearson correlation is used.

pandas.DataFrame.plot

Plots the data frame with Matplotlib.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset