官术网_书友最值得收藏!

Sparse PCA

scikit-learn provides different PCA variants that can solve particular problems. I do suggest reading the original documentation. However, I'd like to mention SparsePCA, which allows exploiting the natural sparsity of data while extracting principal components. If you think about the handwritten digits or other images that must be classified, their initial dimensionality can be quite high (a 10x10 image has 100 features). However, applying a standard PCA selects only the average most important features, assuming that every sample can be rebuilt using the same components. Simplifying, this is equivalent to:

On the other hand, we can always use a limited number of components, but without the limitation given by a dense projection matrix. This can be achieved by using sparse matrices (or vectors), where the number of non-zero elements is quite low. In this way, each element can be rebuilt using its specific components (in most cases, they will be always the most important), which can include elements normally discarded by a dense PCA. The previous expression now becomes:

Here the non-null components have been put into the first block (they don't have the same order as the previous expression), while all the other zero terms have been separated. In terms of linear algebra, the vectorial space now has the original dimensions. However, using the power of sparse matrices (provided by scipy.sparse), scikit-learn can solve this problem much more efficiently than a classical PCA.

The following snippet shows a sparse PCA with 60 components. In this context, they're usually called atoms and the amount of sparsity can be controlled via L1-norm regularization (higher alpha parameter values lead to more sparse results). This approach is very common in classification algorithms and will be discussed in the next chapters: 

from sklearn.decomposition import SparsePCA

>>> spca = SparsePCA(n_components=60, alpha=0.1)
>>> X_spca = spca.fit_transform(digits.data / 255)

>>> spca.components_.shape
(60L, 64L)
For further information about SciPy sparse matrices, visit https://docs.scipy.org/doc/scipy-0.18.1/reference/sparse.html.
主站蜘蛛池模板: 胶南市| 讷河市| 建昌县| 尼玛县| 区。| 商南县| 鄱阳县| 来宾市| 城口县| 大丰市| 宣恩县| 苏尼特左旗| 勃利县| 黔西| 舞钢市| 沛县| 靖远县| 衡南县| 徐水县| 政和县| 盐山县| 浠水县| 习水县| 鄂尔多斯市| 高尔夫| 岳阳县| 卢氏县| 大竹县| 赣榆县| 西吉县| 蒲城县| 庆阳市| 建水县| 葵青区| 晋江市| 贺兰县| 谢通门县| 马龙县| 白山市| 北宁市| 临城县|