QDA is the generalization of a common technique such as quadratic regression. It is simply a generalization of the model to allow for more complex models to fit, though, like all things, when allowing complexity to creep in, we make our life more difficult.
We will expand on the last recipe and look at Quadratic Discernment Analysis (QDA) via the QDA object.
We said we made an assumption about the covariance of the model. Here, we will relax the assumption.
QDA is aptly a member of the qda
module. Use the following commands to use QDA:
>>> from sklearn.qda import QDA >>> qda = QDA() >>> qda.fit(X.ix[:, :-1], X.ix[:, -1]) >>> predictions = qda.predict(X.ix[:, :-1]) >>> predictions.sum() 2812.0 >>> from sklearn.metrics import classification_report >>> print classification_report(X.ix[:, -1].values, predictions) precision recall f1-score support 0.0 0.75 0.36 0.49 1895 1.0 0.57 0.88 0.69 1833 avg / total 0.66 0.62 0.59 3728
As you can see, it's about equal on the whole. If we look back at the LDA recipe, we can see large changes as opposed to the QDA object for class 0 and minor differences for class 1.
Like we talked about in the last recipe, we essentially compare likelihoods here. So, how do we compare likelihoods? Let's just use the price at hand to attempt to classify is_higher
.
We'll assume that the closing price is log-normally distributed. In order to compute the likelihood for each class, we need to create the subsets of closes as well as a training and test set for each class. As a sneak peak to the next chapter, we'll use the built-in cross validation methods:
>>> from sklearn import cross_validation as cv >>> import scipy.stats as sp >>> for test, train in cv.ShuffleSplit(len(X.Close), n_iter=1): train_set = X.iloc[train] train_close = train_set.Close train_0 = train_close[~train_set.is_higher.astype(bool)] train_1 = train_close[train_set.is_higher.astype(bool)] test_set = X.iloc[test] test_close = test_set.Close.values ll_0 = sp.norm.pdf(test_close, train_0.mean()) ll_1 = sp.norm.pdf(test_close, train_1.mean())
Now that we have likelihoods for both classes, we can compare and assign classes:
>>> (ll_0 > ll_1).mean() 0.15588673621460505