For this tutorial, we will use two sample data sets, containing the bare minimum of end-of-day price data. The first company is BHP Billiton (BHP), which is active in the mining of petroleum, metals, and diamonds. The second is Vale (VALE), which is also a metals and mining company. So there is some overlap, albeit not one hundred percent. For trading correlated pairs, follow these steps:
cov
function (it's not strictly necessary to do this, but it will allow us to demonstrate a few matrix operations):covariance = np.cov(bhp_returns, vale_returns) print "Covariance", covariance
The covariance matrix is as follows:
Covariance [[ 0.00028179 0.00019766] [ 0.00019766 0.00030123]]
diagonal
function:print "Covariance diagonal", covariance.diagonal()
The diagonal values of the covariance matrix are as follows:
Covariance diagonal [ 0.00028179 0.00030123]
trace
function:print "Covariance trace", covariance.trace()
The trace values of the covariance matrix are as follows:
Covariance trace 0.00058302354992
a
and b
is:Try it out:
print covariance/ (bhp_returns.std() * vale_returns.std())
The correlation matrix is as follows:
[[ 1.00173366 0.70264666] [ 0.70264666 1.0708476 ]]
-1
to 1
. The correlation of a set of values with itself is 1
by definition. This would be the ideal value; however, we will be also happy with a slightly lower value. Calculate the correlation coefficient (or, more accurately, the correlation matrix) with the corrcoef
function:print "Correlation coefficient", np.corrcoef(bhp_returns, vale_returns)
The coefficients are as follows:
[[ 1. 0.67841747] [ 0.67841747 1. ]]
The values on the diagonal are just the correlations of the BHP and VALE with themselves and are, therefore, equal to 1
. In all probability, no real calculation takes place. The other two values are equal to each other since correlation is symmetrical, meaning that the correlation of BHP with VALE is equal to the correlation of VALE with BHP. It seems that the correlation is not that strong.
If they are out of sync, we could initiate a trade, hoping that they eventually will get back in sync again. Compute the difference between the close prices of the two securities to check the synchronization:
difference = bhp - vale
Check whether the last difference in price is out of sync; see the following code:
avg = np.mean(difference) dev = np.std(difference) print "Out of sync", np.abs(difference[-1] – avg) > 2 * dev
Unfortunately, we cannot trade yet:
Out of sync False
Matplotlib
; this will be discussed in Chapter 9, Plotting with Matplotlib. Plotting can be done as follows:t = np.arange(len(bhp_returns)) plot(t, bhp_returns, lw=1) plot(t, vale_returns, lw=2) show()
The resulting plot:
We analyzed the relation of the closing stock prices of BHP and VALE. To be precise, we calculated the correlation of their stock returns. This was achieved with the corrcoef
function. Further, we saw how the covariance matrix can be computed, from which the correlation can be derived. As a bonus, a demonstration was given of the diagonal
and trace
functions that can give us the diagonal values and the trace of a matrix, respectively (see correlation.py
):
import numpy as np from matplotlib.pyplot import plot from matplotlib.pyplot import show bhp = np.loadtxt('BHP.csv', delimiter=',', usecols=(6,), unpack=True) bhp_returns = np.diff(bhp) / bhp[ : -1] vale = np.loadtxt('VALE.csv', delimiter=',', usecols=(6,), unpack=True) vale_returns = np.diff(vale) / vale[ : -1] covariance = np.cov(bhp_returns, vale_returns) print "Covariance", covariance print "Covariance diagonal", covariance.diagonal() print "Covariance trace", covariance.trace() print covariance/ (bhp_returns.std() * vale_returns.std()) print "Correlation coefficient", np.corrcoef(bhp_returns, vale_returns) difference = bhp - vale avg = np.mean(difference) dev = np.std(difference) print "Out of sync", np.abs(difference[-1] - avg) > 2 * dev t = np.arange(len(bhp_returns)) plot(t, bhp_returns, lw=1) plot(t, vale_returns, lw=2) show()