The covariance matrix

The covariance matrix provides you with an idea of the correlation between all of the different pairs of features. It's usually the first step of dimensionality reduction because it gives you an idea of the number of features that are strongly related (and therefore, the number of features that you can discard) and the ones that are independent. Using the Iris dataset, where each observation has four features, a correlation matrix can be computed easily, and you can understand its results with the help of a simple graphical representation, which can be obtained with the help of the following code:

In: from sklearn import datasets
    import numpy as np
    iris = datasets.load_iris()
    cov_data = np.corrcoef(iris.data.T)
    print (iris.feature_names)
    print (cov_data)

Out: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 
      'petal width (cm)']
     [[ 1.         -0.10936925  0.87175416  0.81795363]
      [-0.10936925  1.         -0.4205161  -0.35654409]
      [ 0.87175416 -0.4205161   1.          0.9627571 ]
      [ 0.81795363 -0.35654409  0.9627571   1.        ]]

Using a heat map, let's visualize the covariance matrix in a graphical form:

In: import matplotlib.pyplot as plt
    img = plt.matshow(cov_data, cmap=plt.cm.rainbow)
    plt.colorbar(img, ticks=[-1, 0, 1], fraction=0.045)
    for x in range(cov_data.shape[0]):
        for y in range(cov_data.shape[1]):
            plt.text(x, y, "%0.2f" % cov_data[x,y], 
                     size=12, color='black', ha="center", va="center")
    plt.show()

Here is the resulting heat map:

From the previous diagram, you can see that the value of the primary diagonal is 1. This is because we're using the normalized version of the covariance matrix (normalizing each feature covariance to 1.0). We can also notice a high correlation between the first and the third, the first and the fourth, and the third and the fourth features. In addition, we can verify that only the second feature is almost independent of the others; all the other features are somehow correlated to each other.

We now have an idea about the potential number of features in the reduced set, imagining compressing the duplicated information as pointed out by the correlation matrix – we can reduce everything simply to two features.

Table of Contents for The covariance matrix

Create new playlist

Sign In

Sign Up

Table of Contents for
The covariance matrix