PCA in itself uses projection, which is a linear method. Because of it, the performance on non-linear data is not optimal. That’s where Kernel PCA comes in.
What is it?
Kernel PCA, or kPCA, is an extension of the traditional PCA method for Dimensionality Reduction most commonly used in Machine Learning.
As the name suggests, it uses the Kernel Trick, which identifies linear high-dimensional relations that become complex non-linear projections when the dimension is reduced.
This works much better than the traditional PCA for non-linear data. Kernel PCA also can cluster instances better than PCA for non-linear data.
Kernel PCA can cluster and linearly separate non-linear data, even if it doesn’t reduce dimensionality.
How does it work?
The main difference between Kernel PCA and PCA is that kPCA doesn’t compute the principal components itself, but rather the projection of data into these components from a higher dimensionality.
Using a given kernel function it implicitly maps data into a higher-dimension feature space, where the data is linearly separable, and then it projects the data back to the desirable dimensionality.
Some common kernel functions are:
-
Linear
Which would be the same as traditional PCA.
-
Polynomial
More flexible than linear, but data needs to follow the shape of the function.
-
Sigmoid
Follows the Sigmoid Function, used in Logistic Regression.
-
Radial Basis Function (RBF)
Also known as the Gaussian kernel, maps normal curves around all data.
-
Cosine
Measures the cosine of the angle between vectors. Better used when similarity depends on direction rather than magnitude.
Using scikit-learn
In practice, kPCA uses the same trick as using Nystroem approximation in linear regression to adapt to non-liner data. We can fit linear algorithms to non-linear data if applying kPCA before the estimator.
For example, given the make moons dataset, would be impossible to fit a linear classification algorithm, but using kPCA the algorithm can draw a linear decision boundary in higher-dimensionality as seen from the image above. That translates to a non-linear classification, as seen below.