官术网_书友最值得收藏!

Dimensionality reduction

Dimensionality reduction is the process of converting a set of data with many variables into data with lesser dimensions but ensuring similar information. It can help improve model accuracy and performance, improve interpretability, and prevent overfitting. The Statistics and Machine Learning Toolbox includes many algorithms and functions for reducing the dimensionality of our datasets. It can be divided into feature selection and feature extraction. Feature selection approaches try to find a subset of the original variables. Feature extraction reduces the dimensionality in the data by transforming data into new features.

As already mentioned, feature selection finds only the subset of measured features (predictor variables) that give the best predictive performance in modeling the data. The Statistics and Machine Learning Toolbox includes many feature selection methods, as follows:

  • Stepwise regression: Adds or removes features until there is no improvement in prediction accuracy. Especially suited for linear regression or generalized linear regression algorithms.
  • Sequential feature selection: Equivalent to stepwise regression, this can be applied with any supervised learning algorithm.
  • Selecting features for classifying high-dimensional data.
  • Boosted and bagged decision trees: Calculate the variable's importance from out-of-bag errors.
  • Regularization: Remove redundant features by reducing their weights to zero.

Otherwise, feature extraction transforms existing features into new features (predictor variables) where less-descriptive features can be ignored.

The Statistics and Machine Learning Toolbox includes many feature extraction methods, as follows:

  • PCA: This can be applied to summarize data in fewer dimensions by projection onto a unique orthogonal basis
  • Non-negative matrix factorization: This can be applied when model terms must represent non-negative quantities
  • Factor analysis: This can be applied to build explanatory models of data correlations

The following are step-wise regression example charts:

Figure 1.20: Step-wise regression example
主站蜘蛛池模板: 当阳市| 长葛市| 苍梧县| 饶阳县| 封开县| 留坝县| 兴隆县| 牟定县| 衡山县| 卢湾区| 河南省| 郯城县| 沛县| 南开区| 东源县| 平泉县| 宁都县| 昌吉市| 乌什县| 威远县| 星座| 焦作市| 新密市| 从化市| 福贡县| 宣武区| 溧水县| 梅州市| 郴州市| 六枝特区| 海淀区| 乐亭县| 柏乡县| 如东县| 石狮市| 鸡西市| 东山县| 柯坪县| 大厂| 麻栗坡县| 南城县|