Kernel PCA for nonlinear dimensionality reduction

Most of the techniques in statistics are linear by nature, so in order to capture nonlinearity, we might need to apply some transformation. PCA is, of course, a linear transformation. In this recipe, we'll look at applying nonlinear transformations, and then apply PCA for dimensionality reduction.

Getting ready

Life would be so easy if data was always linearly separable, but unfortunately it's not. Kernel PCA can help to circumvent this issue. Data is first run through the kernel function that projects the data onto a different space; then PCA is performed.

To familiarize yourself with the kernel functions, it will be a good exercise to think of how to generate data that is separable by the kernel functions available in the kernel PCA. Here, we'll do that with the cosine kernel. This recipe will have a bit more theory than the previous recipes.

How to do it...

The cosine kernel works by comparing the angle between two samples represented in the feature space. It is useful when the magnitude of the vector perturbs the typical distance measure used to compare samples.

As a reminder, the cosine between two vectors is given by the following:

How to do it...

This means that the cosine between A and B is the dot product of the two vectors normalized by the product of the individual norms. The magnitude of vectors A and B have no influence on this calculation.

So, let's generate some data and see how useful it is. First, we'll imagine there are two different underlying processes; we'll call them A and B:

>>> import numpy as np
>>> A1_mean = [1, 1]
>>> A1_cov = [[2, .99], [1, 1]]
>>> A1 = np.random.multivariate_normal(A1_mean, A1_cov, 50)

>>> A2_mean = [5, 5]
>>> A2_cov = [[2, .99], [1, 1]]
>>> A2 = np.random.multivariate_normal(A2_mean, A2_cov, 50)

>>> A = np.vstack((A1, A2))

>>> B_mean = [5, 0]
>>> B_cov = [[.5, -1], [-.9, .5]]
>>> B = np.random.multivariate_normal(B_mean, B_cov, 100)

Once plotted, it will look like the following:

How to do it...

By visual inspection, it seems that the two classes are from different processes, but separating them in one slice might be difficult. So, we'll use the kernel PCA with the cosine kernel discussed earlier:

>>> kpca = decomposition.KernelPCA(kernel='cosine', n_components=1)
>>> AB = np.vstack((A, B))
>>> AB_transformed = kpca.fit_transform(AB)

Visualized in one dimension after the kernel PCA, the dataset looks like the following:

How to do it...

Contrast this with PCA without a kernel:

How to do it...

Clearly, the kernel PCA does a much better job.

How it works...

There are several different kernels available as well as the cosine kernel. You can even write your own kernel function. The available kernels are:

  • poly (polynomial)
  • rbf (radial basis function)
  • sigmoid
  • cosine
  • precomputed

There are also options contingent of the kernel choice. For example, the degree argument will specify the degree for the poly, rbf, and sigmoid kernels; also, gamma will affect the rbf or poly kernels.

The recipe on SVM will cover the rbf kernel function in more detail.

A word of caution: kernel methods are great to create separability, but they can also cause overfitting if used without care.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset