Correlation

Using correlation, we can easily see linear relationships between pairs of features. In the following graphs, we can see different degrees of correlation, together with a potential linear dependency plotted as a dashed line (fitted one-dimensional polynomial). The correlation coefficient Cor (X1, X2at the top of the individual graphs is calculated using the common Pearson correlation coefficient (pearson  value) by means of the pearsonr() function of scipy.stat.

Given two equal-sized data series, it returns a tuple of the correlation coefficient value and the p-value. The p-value describes how likely it is that the data series has been generated by an uncorrelated system. In other words, the higher the p-value, the less we should trust the correlation coefficient:

>>> from scipy.stats import pearsonr
>>> pearsonr([1,2,3], [1,2,3.1])
(0.99962228516121843, 0.017498096813278487)
>>> pearsonr([1,2,3], [1,20,6])
(0.25383654128340477, 0.83661493668227427)

In the first case, we have a clear indication that both series are correlated. In the second case, we still have a clearly nonzero value.

However, the p-value of 0.84 tells us that the correlation coefficient is not significant, and we should not pay too much attention to it. In the first three cases that have high correlation coefficients in the following graph, we would probably want to throw out either X1 or X2 because they seem to convey similar, if not the same, information:

In the last case, however, we should keep both features. In our application, this decision would, of course, be driven by this p-value.

Although it worked nicely in the preceding example, reality is seldom nice. One big disadvantage of correlation-based feature selection is that it only detects linear relationships (a relationship that can be modeled by a straight line). We can see the problem if we use correlation on nonlinear data. In the following example, we have a quadratic relationship:

Although the human eye immediately sees the relationship between X1 and X2 in all but the bottom-right graph, the correlation coefficient does not. It's obvious that correlation is useful for detecting linear relationships, but fails for everything else. Sometimes, it really helps to apply simple transformations to get a linear relationship. For instance, in the preceding plot, we would have got a high correlation coefficient if we had drawn X2 over X1 squared. Normal data, however, seldom offers this opportunity.

Luckily, for nonlinear relationships, mutual information comes to the rescue.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset