PCA makes an assumption that all the principal directions of variation in the data are straight lines. This is not true in a lot of real-world datasets.
In this section, we will look at kernel PCA, which will help us reduce the dimension of datasets where the variations in them are not straight lines. We will explicitly create such a dataset and apply kernel PCA on it.
In kernel PCA, a kernel function is applied to all the data points. This transforms the input data into kernel space. A normal PCA is performed in the kernel space.
We will not use the Iris dataset here, but will generate a dataset where variations are not straight lines. This way, we cannot apply a simple PCA on this dataset. Let's proceed to look at our recipe.
Let's load the necessary libraries. We will proceed to make a dataset using the make_circles
function from the scikit-learn library. We will plot this data and do a normal PCA on this dataset:
from sklearn.datasets import make_circles import matplotlib.pyplot as plt import numpy as np from sklearn.decomposition import PCA from sklearn.decomposition import KernelPCA # Generate a dataset where the variations cannot be captured by a straight line. np.random.seed(0) x,y = make_circles(n_samples=400, factor=.2,noise=0.02) # Plot the generated dataset plt.close('all') plt.figure(1) plt.title("Original Space") plt.scatter(x[:,0],x[:,1],c=y) plt.xlabel("$x_1$") plt.ylabel("$x_2$") # Try to fit the data using normal PCA pca = PCA(n_components=2) pca.fit(x) x_pca = pca.transform(x)
We will then plot the first two principal components of this dataset. We will plot the dataset using only the first principal component:
plt.figure(2) plt.title("PCA") plt.scatter(x_pca[:,0],x_pca[:,1],c=y) plt.xlabel("$Component_1$") plt.ylabel("$Component_2$") # Plot using the first component from normal pca class_1_indx = np.where(y==0)[0] class_2_indx = np.where(y==1)[0] plt.figure(3) plt.title("PCA- One component") plt.scatter(x_pca[class_1_indx,0],np.zeros(len(class_1_indx)),color='red') plt.scatter(x_pca[class_2_indx,0],np.zeros(len(class_2_indx)),color='blue')
Let's finish it up by performing a kernal PCA and plotting the components:
# Create KernelPCA object in Scikit learn, specifying a type of kernel as a parameter. kpca = KernelPCA(kernel="rbf",gamma=10) # Perform KernelPCA kpca.fit(x) x_kpca = kpca.transform(x) # Plot the first two components. plt.figure(4) plt.title("Kernel PCA") plt.scatter(x_kpca[:,0],x_kpca[:,1],c=y) plt.xlabel("$Component_1$") plt.ylabel("$Component_2$") plt.show()
In step 1, we generated a dataset using the scikit's data generation function. In this case, we used the make_circles
function. We can create two concentric circles, a large one containing the smaller one, using this function. Each concentric circle belongs to a certain class. Thus, we created a two class problem with two concentric circles.
First, let's look at the data that we generated. The make_circles
function generated a dataset of size 400 with two dimensions. A plot of the original data is as follows:
This chart describes how our data has been distributed. The outer circle belongs to class one and the inner circle belongs to class two. Is there a way we can take this data and use it with a linear classifier? We will not be able to do it. The variations in the data are not straight lines. We cannot use the normal PCA. Hence, we will resort to a kernel PCA in order to transform the data.
Before we venture into kernel PCA, let's see what happens if we apply a normal PCA on this dataset.
Let's look at the output plot of the first two components:
As you can see, the components of PCA are unable to distinguish between the two classes in a linear fashion.
Let's plot the first component and see its class distinguishing ability. The following graph, where we have plotted only the first component, explains how PCA is unable to differentiate the data:
The normal PCA approach is a linear projection technique that works well if the data is linearly separable. In cases where the data is not linearly separable, a nonlinear technique is required for the dimensionality reduction of the dataset.
Let's proceed to create a kernel PCA object using the scikit-learn library. Here is our object creation code:
KernelPCA(kernel=rbf,gamma=10)
We selected the Radial Basis Function (RBF) kernel with a gamma value of ten. Gamma is the parameter of the kernel (to handle nonlinearity)—the kernel coefficient.
Before we go further, let's look at a little bit of theory about what kernels really are. As a simple definition, a kernel is a function that computes the dot product, that is, the similarity between two vectors, which are passed to it as input.
The RBFGaussian kernel is defined as follows for two points, x and x' in some input space:
Where,
The RBF decreases with distance and takes values between 0 and 1. Hence it can be interpreted as a similarity measure. The feature space of the RBF kernel has infinite dimensions –Wikipedia.
http://en.wikipedia.org/wiki/Radial_basis_function_kernel.
Let's now transform the input from the feature space into the kernel space. We will perform a PCA in the kernel space.
Finally, we will plot the first two principal components as a scatter plot. The points are colored based on their class value:
You can see in this graph that the points are linearly separated in the kernel space.
Scikit-learn's kernel PCA object also allows other types of kernels, as follows:
Scikit-learn also provides other types of nonlinear data that is generated. The following is another example:
from sklearn.datasets import make_moons x,y = make_moons(100) plt.figure(5) plt.title("Non Linear Data") plt.scatter(x[:,0],x[:,1],c=y) plt.xlabel("$x_1$") plt.ylabel("$x_2$") plt.savefig('fig-7.png') plt.show()