官术网_书友最值得收藏!

Atom extraction and dictionary learning

Dictionary learning is a technique which allows rebuilding a sample starting from a sparse dictionary of atoms (similar to principal components). In Mairal J., Bach F., Ponce J., Sapiro G., Online Dictionary Learning for Sparse Coding, Proceedings of the 29th International Conference on Machine Learning, 2009 there's a description of the same online strategy adopted by scikit-learn, which can be summarized as a double optimization problem where:

Is an input dataset and the target is to find both a dictionary D and a set of weights for each sample:

After the training process, an input vector can be computed as:

The optimization problem (which involves both D and alpha vectors) can be expressed as the minimization of the following loss function:

Here the parameter c controls the level of sparsity (which is proportional to the strength of L1 normalization). This problem can be solved by alternating the least square variable until a stable point is reached.

In scikit-learn, we can implement such an algorithm with the class DictionaryLearning (using the usual MNIST datasets), where n_components, as usual, determines the number of atoms:

from sklearn.decomposition import DictionaryLearning

>>> dl = DictionaryLearning(n_components=36, fit_algorithm='lars', transform_algorithm='lasso_lars')
>>> X_dict = dl.fit_transform(digits.data)

A plot of each atom (component) is shown in the following figure:

This process can be very long on low-end machines. In such a case, I suggest limiting the number of samples to 20 or 30.
主站蜘蛛池模板: 宣威市| 乾安县| 西乌| 抚远县| 沙坪坝区| 六盘水市| 格尔木市| 禹州市| 平遥县| 曲靖市| 永州市| 江安县| 浑源县| 新乐市| 文昌市| 同德县| 德州市| 澄迈县| 永康市| 穆棱市| 和田市| 靖边县| 胶州市| 吉林省| 汾阳市| 中卫市| 虹口区| 海伦市| 陈巴尔虎旗| 申扎县| 阳新县| 江西省| 襄垣县| 阜新| 长泰县| 深州市| 兴化市| 普洱| 海原县| 罗田县| 嘉黎县|