Working with QDA – a nonlinear LDA

QDA is the generalization of a common technique such as quadratic regression. It is simply a generalization of the model to allow for more complex models to fit, though, like all things, when allowing complexity to creep in, we make our life more difficult.

Getting ready

We will expand on the last recipe and look at Quadratic Discernment Analysis (QDA) via the QDA object.

We said we made an assumption about the covariance of the model. Here, we will relax the assumption.

How to do it…

QDA is aptly a member of the qda module. Use the following commands to use QDA:

>>> from sklearn.qda import QDA
>>> qda = QDA()

>>> qda.fit(X.ix[:, :-1], X.ix[:, -1])
>>> predictions = qda.predict(X.ix[:, :-1])
>>> predictions.sum()
2812.0

>>> from sklearn.metrics import classification_report
>>> print classification_report(X.ix[:, -1].values, predictions)
              precision     recall     f1-score     support 
0.0                0.75       0.36        0.49         1895 
1.0                0.57       0.88        0.69         1833 
avg / total        0.66       0.62        0.59         3728

As you can see, it's about equal on the whole. If we look back at the LDA recipe, we can see large changes as opposed to the QDA object for class 0 and minor differences for class 1.

How it works…

Like we talked about in the last recipe, we essentially compare likelihoods here. So, how do we compare likelihoods? Let's just use the price at hand to attempt to classify is_higher.

We'll assume that the closing price is log-normally distributed. In order to compute the likelihood for each class, we need to create the subsets of closes as well as a training and test set for each class. As a sneak peak to the next chapter, we'll use the built-in cross validation methods:

>>> from sklearn import cross_validation as cv

>>> import scipy.stats as sp

>>> for test, train in cv.ShuffleSplit(len(X.Close), n_iter=1):
       train_set = X.iloc[train]
       train_close = train_set.Close
    
       train_0 = train_close[~train_set.is_higher.astype(bool)]
       train_1 = train_close[train_set.is_higher.astype(bool)]
    
       test_set = X.iloc[test]
       test_close = test_set.Close.values
    
       ll_0 = sp.norm.pdf(test_close, train_0.mean())
       ll_1 = sp.norm.pdf(test_close, train_1.mean())

Now that we have likelihoods for both classes, we can compare and assign classes:

>>> (ll_0 > ll_1).mean()
0.15588673621460505
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset