官术网_书友最值得收藏!

Kernel PCA

We're going to discuss kernel methods in Chapter 7, Support Vector Machines, however, it's useful to mention the class KernelPCA, which performs a PCA with non-linearly separable data sets. Just to understand the logic of this approach (the mathematical formulation isn't very simple), it's useful to consider a projection of each sample into a particular space where the dataset becomes linearly separable. The components of this space correspond to the first, second, ... principal components and a kernel PCA algorithm, therefore, computes the projection of our samples onto each of them.

Let's consider a dataset made up of a circle with a blob inside:

from sklearn.datasets import make_circles

>>> Xb, Yb = make_circles(n_samples=500, factor=0.1, noise=0.05)

The graphical representation is shown in the following picture. In this case, a classic PCA approach isn't able to capture the non-linear dependency of existing components (the reader can verify that the projection is equivalent to the original dataset). However, looking at the samples and using polar coordinates (therefore, a space where it's possible to project all the points), it's easy to separate the two sets, only considering the radius:

Considering the structure of the dataset, it's possible to investigate the behavior of a PCA with a radial basis function kernel. As the default value for gamma is 1.0/number of features (for now, consider this parameter as inversely proportional to the variance of a Gaussian), we need to increase it to capture the external circle. A value of 1.0 is enough:

from sklearn.decomposition import KernelPCA

>>> kpca = KernelPCA(n_components=2, kernel='rbf', fit_inverse_transform=True, gamma=1.0)
>>> X_kpca = kpca.fit_transform(Xb)

The instance variable X_transformed_fit_ will contain the projection of our dataset into the new space. Plotting it, we get:

The plot shows a separation just like expected, and it's also possible to see that the points belonging to the central blob have a curve distribution because they are more sensitive to the distance from the center.

Kernel PCA is a powerful instrument when we think of our dataset as made up of elements that can be a function of components (in particular, radial-basis or polynomials) but we aren't able to determine a linear relationship among them.

For more information about the different kernels supported by scikit-learn, visit  http://scikit-learn.org/stable/modules/metrics.html#linear-kernel.
主站蜘蛛池模板: 古浪县| 广昌县| 馆陶县| 历史| 台中市| 宝兴县| 平陆县| 普兰县| 垦利县| 米泉市| 桃江县| 孟连| 江永县| 三穗县| 本溪| 平度市| 苍溪县| 晋城| 南昌市| 武陟县| 丹凤县| 房产| 克山县| 平乐县| 长岭县| 象山县| 滨州市| 乳山市| 邛崃市| 筠连县| 长汀县| 佛学| 东方市| 汶上县| 铜山县| 霍林郭勒市| 淳化县| 永定县| 沙雅县| 海淀区| 尼木县|