Sketching PCA

PCA involves a lot of linear algebra, which we do not want to go into. Nevertheless, the basic algorithm's process can be easily described as follows:

  1. Center the data by subtracting the mean from it
  2. Calculate the covariance matrix
  3. Calculate the eigenvectors of the covariance matrix

If we start with N features, then the algorithm will return a transformed feature space with N dimensions (we have gained nothing so far). The nice thing about this algorithm, however, is that the eigenvalues indicate how much of the variance is described by the corresponding eigenvector.

Let's assume that we start with N = 1000 features and that we know that our model does not work well with more than 20 features. Then, we simply pick the 20 eigenvectors with the highest eigenvalues.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset