官术网_书友最值得收藏!

Chapter 5. Dimension Reduction

As described in the Assessing a model/overfitting section of Chapter 2, Data Pipelines, the indiscriminative reliance of a large number of features may cause overfitting; the model may become so tightly coupled with the training set that different validation sets will generate a vastly different outcome and quality metrics such as AuROC.

Dimension reduction techniques alleviate these problems by detecting features that have little influence on the overall model behavior.

This chapter introduces three categories of dimension reduction techniques with two implementations in Scala:

  • Divergence with an implementation of the Kullback-Leibler distance
  • Principal components analysis
  • Estimation of low dimension feature space for nonlinear models

Other types of methodologies used to reduce the number of features such as regularization or singular value decomposition are discussed in future chapters.

But first, let's start our investigation by defining the problem.

主站蜘蛛池模板: 全州县| 东城区| 白河县| 土默特左旗| 乐东| 阿拉善右旗| 资中县| 平安县| 易门县| 河东区| 宝清县| 四子王旗| 当涂县| 交城县| 宜良县| 镇平县| 通州市| 山东| 绥阳县| 营口市| 平遥县| 汶川县| 福安市| 东海县| 兴业县| 平湖市| 丰顺县| 黄浦区| 凤冈县| 玉溪市| 渭南市| 宁河县| 吉水县| 内黄县| 元朗区| 泉州市| 东乡| 天祝| 洛浦县| 乌什县| 蒲城县|