- Machine Learning Algorithms
- Giuseppe Bonaccorso
- 377字
- 2021-07-02 18:53:31
Kernel PCA
We're going to discuss kernel methods in Chapter 7, Support Vector Machines, however, it's useful to mention the class KernelPCA, which performs a PCA with non-linearly separable data sets. Just to understand the logic of this approach (the mathematical formulation isn't very simple), it's useful to consider a projection of each sample into a particular space where the dataset becomes linearly separable. The components of this space correspond to the first, second, ... principal components and a kernel PCA algorithm, therefore, computes the projection of our samples onto each of them.
Let's consider a dataset made up of a circle with a blob inside:
from sklearn.datasets import make_circles
>>> Xb, Yb = make_circles(n_samples=500, factor=0.1, noise=0.05)
The graphical representation is shown in the following picture. In this case, a classic PCA approach isn't able to capture the non-linear dependency of existing components (the reader can verify that the projection is equivalent to the original dataset). However, looking at the samples and using polar coordinates (therefore, a space where it's possible to project all the points), it's easy to separate the two sets, only considering the radius:

Considering the structure of the dataset, it's possible to investigate the behavior of a PCA with a radial basis function kernel. As the default value for gamma is 1.0/number of features (for now, consider this parameter as inversely proportional to the variance of a Gaussian), we need to increase it to capture the external circle. A value of 1.0 is enough:
from sklearn.decomposition import KernelPCA
>>> kpca = KernelPCA(n_components=2, kernel='rbf', fit_inverse_transform=True, gamma=1.0)
>>> X_kpca = kpca.fit_transform(Xb)
The instance variable X_transformed_fit_ will contain the projection of our dataset into the new space. Plotting it, we get:
The plot shows a separation just like expected, and it's also possible to see that the points belonging to the central blob have a curve distribution because they are more sensitive to the distance from the center.
Kernel PCA is a powerful instrument when we think of our dataset as made up of elements that can be a function of components (in particular, radial-basis or polynomials) but we aren't able to determine a linear relationship among them.
- JavaScript百煉成仙
- Ceph Cookbook
- Learning ArcGIS Pro 2
- Java Web基礎與實例教程(第2版·微課版)
- Python測試開發入門與實踐
- MATLAB應用與實驗教程
- Creating Stunning Dashboards with QlikView
- 利用Python進行數據分析
- 基于ARM Cortex-M4F內核的MSP432 MCU開發實踐
- Unity&VR游戲美術設計實戰
- Python:Deeper Insights into Machine Learning
- Getting Started with Nano Server
- uni-app跨平臺開發與應用從入門到實踐
- ASP.NET Web API Security Essentials
- 寫給大家看的Midjourney設計書