官术网_书友最值得收藏!

Dimensionality reduction

Dimensionality reduction is the process of converting a set of data with many variables into data with lesser dimensions but ensuring similar information. It can help improve model accuracy and performance, improve interpretability, and prevent overfitting. The Statistics and Machine Learning Toolbox includes many algorithms and functions for reducing the dimensionality of our datasets. It can be divided into feature selection and feature extraction. Feature selection approaches try to find a subset of the original variables. Feature extraction reduces the dimensionality in the data by transforming data into new features.

As already mentioned, feature selection finds only the subset of measured features (predictor variables) that give the best predictive performance in modeling the data. The Statistics and Machine Learning Toolbox includes many feature selection methods, as follows:

  • Stepwise regression: Adds or removes features until there is no improvement in prediction accuracy. Especially suited for linear regression or generalized linear regression algorithms.
  • Sequential feature selection: Equivalent to stepwise regression, this can be applied with any supervised learning algorithm.
  • Selecting features for classifying high-dimensional data.
  • Boosted and bagged decision trees: Calculate the variable's importance from out-of-bag errors.
  • Regularization: Remove redundant features by reducing their weights to zero.

Otherwise, feature extraction transforms existing features into new features (predictor variables) where less-descriptive features can be ignored.

The Statistics and Machine Learning Toolbox includes many feature extraction methods, as follows:

  • PCA: This can be applied to summarize data in fewer dimensions by projection onto a unique orthogonal basis
  • Non-negative matrix factorization: This can be applied when model terms must represent non-negative quantities
  • Factor analysis: This can be applied to build explanatory models of data correlations

The following are step-wise regression example charts:

Figure 1.20: Step-wise regression example
主站蜘蛛池模板: 昭觉县| 大名县| 龙里县| 建平县| 廉江市| 葵青区| 滨州市| 西昌市| 波密县| 汽车| 玉环县| 咸宁市| 南溪县| 梓潼县| 华坪县| 西城区| 历史| 旅游| 应用必备| 嘉义市| 富川| 嘉义县| 南平市| 淄博市| 海伦市| 成安县| 师宗县| 濮阳市| 集贤县| 泾阳县| 洛宁县| 皮山县| 成武县| 海原县| 绩溪县| 吴桥县| 祁阳县| 靖边县| 华坪县| 松阳县| 个旧市|