官术网_书友最值得收藏!

Dimensionality reduction

Feature reduction (or feature selection) or dimensionality reduction is the process of reducing the input set of independent variables to obtain a lesser number of variables that are really required by the model to predict the target.

In certain cases, it is possible to represent multiple dependent variables by combining them together without losing much information. For example, instead of having two independent variables such as the length of a rectangle and the breath of a rectangle, the dimensions can be represented by only one variable called the area that represents both the length and breadth of the rectangle.

The following mentioned are the multiple reasons we need to perform a dimensionality reduction on a given input dataset:

  • To aid data compression, therefore accommodate the data in a smaller amount of disk space.
  • The time to process the data is reduced as fewer dimensions are used to represent the data.
  • It removes redundant features from datasets. Redundant features are typically known as multicollinearity in data.
  • Reducing the data to fewer dimensions helps visualize the data through graphs and charts.
  • Dimensionality reduction removes noisy features from the dataset which, in turn, improves the model performance.

There are many ways by which dimensionality reduction can be attained in a dataset. The use of filters, such as information gain filters, and symmetric attribute evaluation filters, is one way. Genetic-algorithm-based selection and principal component analysis (PCA) are other popular techniques used to achieve dimensionality reduction. Hybrid methods do exist to attain feature selection.

主站蜘蛛池模板: 玉山县| 武川县| 新津县| 准格尔旗| 浮山县| 荥经县| 崇礼县| 吴堡县| 汪清县| 建阳市| 屯门区| 界首市| 马关县| 新营市| 德州市| 成都市| 三穗县| 怀安县| 泰州市| 皮山县| 灯塔市| 二连浩特市| 当雄县| 永川市| 安图县| 航空| 顺平县| 托克逊县| 苗栗市| 昭通市| 新化县| 昌乐县| 苍山县| 宝坻区| 海晏县| 武定县| 团风县| 岫岩| 利津县| 英山县| 麻阳|