- Machine Learning Algorithms
- Giuseppe Bonaccorso
- 319字
- 2021-07-02 18:53:31
Sparse PCA
scikit-learn provides different PCA variants that can solve particular problems. I do suggest reading the original documentation. However, I'd like to mention SparsePCA, which allows exploiting the natural sparsity of data while extracting principal components. If you think about the handwritten digits or other images that must be classified, their initial dimensionality can be quite high (a 10x10 image has 100 features). However, applying a standard PCA selects only the average most important features, assuming that every sample can be rebuilt using the same components. Simplifying, this is equivalent to:

On the other hand, we can always use a limited number of components, but without the limitation given by a dense projection matrix. This can be achieved by using sparse matrices (or vectors), where the number of non-zero elements is quite low. In this way, each element can be rebuilt using its specific components (in most cases, they will be always the most important), which can include elements normally discarded by a dense PCA. The previous expression now becomes:

Here the non-null components have been put into the first block (they don't have the same order as the previous expression), while all the other zero terms have been separated. In terms of linear algebra, the vectorial space now has the original dimensions. However, using the power of sparse matrices (provided by scipy.sparse), scikit-learn can solve this problem much more efficiently than a classical PCA.
The following snippet shows a sparse PCA with 60 components. In this context, they're usually called atoms and the amount of sparsity can be controlled via L1-norm regularization (higher alpha parameter values lead to more sparse results). This approach is very common in classification algorithms and will be discussed in the next chapters:
from sklearn.decomposition import SparsePCA
>>> spca = SparsePCA(n_components=60, alpha=0.1)
>>> X_spca = spca.fit_transform(digits.data / 255)
>>> spca.components_.shape
(60L, 64L)
- Clojure Programming Cookbook
- 深入理解Android(卷I)
- Django開發從入門到實踐
- Programming ArcGIS 10.1 with Python Cookbook
- Dependency Injection in .NET Core 2.0
- Practical Windows Forensics
- 微信小程序開發解析
- Jupyter數據科學實戰
- Oracle 18c 必須掌握的新特性:管理與實戰
- Python入門很輕松(微課超值版)
- Visual Basic程序設計(第三版)
- Web Developer's Reference Guide
- 深入理解BootLoader
- Scala Functional Programming Patterns
- Python 快速入門(第3版)