官术网_书友最值得收藏!

Chapter 5. Dimension Reduction

As described in the Assessing a model/overfitting section of Chapter 2, Data Pipelines, the indiscriminative reliance of a large number of features may cause overfitting; the model may become so tightly coupled with the training set that different validation sets will generate a vastly different outcome and quality metrics such as AuROC.

Dimension reduction techniques alleviate these problems by detecting features that have little influence on the overall model behavior.

This chapter introduces three categories of dimension reduction techniques with two implementations in Scala:

  • Divergence with an implementation of the Kullback-Leibler distance
  • Principal components analysis
  • Estimation of low dimension feature space for nonlinear models

Other types of methodologies used to reduce the number of features such as regularization or singular value decomposition are discussed in future chapters.

But first, let's start our investigation by defining the problem.

主站蜘蛛池模板: 江达县| 寻乌县| 达孜县| 嘉义市| 宁都县| 赣榆县| 龙海市| 汉川市| 温泉县| 怀安县| 区。| 盖州市| 宣城市| 金华市| 鄂托克前旗| 永春县| 枣阳市| 金华市| 乐东| 周至县| 潢川县| 嵊泗县| 香格里拉县| 鸡泽县| 临潭县| 田林县| 望江县| 肃南| 黑水县| 双峰县| 通化市| 扎赉特旗| 沙湾县| 巢湖市| 信阳市| 麻栗坡县| 岳西县| 金阳县| 秦皇岛市| 耒阳市| 嘉黎县|