Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Clustering Dow Jones stocks with scikits-learn

Clustering is a type of machine learning algorithm, which aims to group items based on similarities. In this example, we will use the log returns of stocks in the Dow Jones Industrial Index to cluster. Most of the steps of this recipe have already passed the review in previous chapters.

How to do it...

First, we will download the EOD price data for those stocks from Yahoo Finance. Second, we will calculate a square affinity matrix. Finally, we will cluster the stocks with the AffinityPropagation class.

Downloading the price data.

We will download price data for 2011 using the stock symbols of the DJI Index. In this example, we are only interested in the close price:

# 2011 to 2012
start = datetime.datetime(2011, 01, 01)
end = datetime.datetime(2012, 01, 01)

#Dow Jones symbols
symbols = ["AA", "AXP", "BA", "BAC", "CAT", "CSCO", "CVX", "DD", "DIS", "GE", "HD", "HPQ", "IBM", "INTC", "JNJ", "JPM", "KFT", "KO", "MCD", "MMM", "MRK", "MSFT", "PFE", "PG", "T", "TRV", "UTX", "VZ", "WMT", "XOM"]

quotes = [finance.quotes_historical_yahoo(symbol, start, end, asobject=True)
    for symbol in symbols]

close = numpy.array([q.close for q in quotes]).astype(numpy.float)

Calculating the affinity matrix.

Calculate the similarities between different stocks using the log returns as metric. What we are trying to do is calculate the Euclidean distances for the data points:

logreturns = numpy.diff(numpy.log(close))
print logreturns.shape

logreturns_norms = numpy.sum(logreturns ** 2, axis=1)
S = - logreturns_norms[:, numpy.newaxis] - logreturns_norms[numpy.newaxis, :] + 2 * numpy.dot(logreturns, logreturns.T)

Clustering the stocks.
Give the AffinityPropagation class the result from the previous step. This class labels the data points, or in our case, stocks with the appropriate cluster number:
```
aff_pro = sklearn.cluster.AffinityPropagation().fit(S)
labels = aff_pro.labels_

for i in xrange(len(labels)):
    print '%s in Cluster %d' % (symbols[i], labels[i])
```

The complete clustering program is as follows:

import datetime
import numpy
import sklearn.cluster
from matplotlib import finance

#1. Download price data

# 2011 to 2012
start = datetime.datetime(2011, 01, 01)
end = datetime.datetime(2012, 01, 01)

#Dow Jones symbols
symbols = ["AA", "AXP", "BA", "BAC", "CAT",
  "CSCO", "CVX", "DD", "DIS", "GE", "HD",
  "HPQ", "IBM", "INTC", "JNJ", "JPM", "KFT",
  "KO", "MCD", "MMM", "MRK", "MSFT", "PFE",
  "PG", "T", "TRV", "UTX", "VZ", "WMT", "XOM"]

quotes = [finance.quotes_historical_yahoo(symbol, start, end, asobject=True)
    for symbol in symbols]

close = numpy.array([q.close for q in quotes]).astype(numpy.float)
print close.shape

#2. Calculate affinity matrix
logreturns = numpy.diff(numpy.log(close))
print logreturns.shape

logreturns_norms = numpy.sum(logreturns ** 2, axis=1)
S = - logreturns_norms[:, numpy.newaxis] - logreturns_norms[numpy.newaxis, :] + 2 * numpy.dot(logreturns, logreturns.T)

#3. Cluster using affinity propagation
aff_pro = sklearn.cluster.AffinityPropagation().fit(S)
labels = aff_pro.labels_

for i in xrange(len(labels)):
    print '%s in Cluster %d' % (symbols[i], labels[i])

The output with the cluster numbers for each stock is as follows:

(30, 252)
(30, 251)
AA in Cluster 0
AXP in Cluster 6
BA in Cluster 6
BAC in Cluster 1
CAT in Cluster 6
CSCO in Cluster 2
CVX in Cluster 7
DD in Cluster 6
DIS in Cluster 6
GE in Cluster 6
HD in Cluster 5
HPQ in Cluster 3
IBM in Cluster 5
INTC in Cluster 6
JNJ in Cluster 5
JPM in Cluster 4
KFT in Cluster 5
KO in Cluster 5
MCD in Cluster 5
MMM in Cluster 6
MRK in Cluster 5
MSFT in Cluster 5
PFE in Cluster 7
PG in Cluster 5
T in Cluster 5
TRV in Cluster 5
UTX in Cluster 6
VZ in Cluster 5
WMT in Cluster 5
XOM in Cluster 7

How it works...

The following table is an overview of the functions we used in this recipe:

Function	Description
`sklearn.cluster.AffinityPropagation()`	Creates an `AffinityPropagation` object.
`sklearn.cluster.AffinityPropagation.fit`	Computes an affinity matrix from Euclidian distances and applies affinity propagation clustering.
`diff`	Calculates differences of numbers within a NumPy array. If not specified, first-order differences are computed.
`log`	Calculates the natural log of elements in a NumPy array.
`sum`	Sums the elements of a NumPy array.
`dot`	Does matrix multiplication for 2D arrays. Calculates the inner product for 1D arrays.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Clustering Dow Jones stocks with scikits-learn

Create new playlist

Sign In

Sign Up

Clustering Dow Jones stocks with scikits-learn

How to do it...

How it works...

Table of Contents for
Clustering Dow Jones stocks with scikits-learn