官术网_书友最值得收藏!

Scaling

Values of different features can differ by orders of magnitude. Sometimes this may mean that the larger values dominate the smaller values. This depends on the algorithm we are using. For certain algorithms to work properly we are required to scale the data. There are several common strategies that we can apply:

  • Standardization removes the mean of a feature and divides by the standard deviation. If the feature values are normally distributed, we will get a Gaussian, which is centered around zero with a variance of one.
  • If the feature values are not normally distributed, we can remove the median and divide by the interquartile range. The interquartile range is a range between the first and third quartile (or 25th and 75th percentile).
  • Scaling features to a range is a common choice of range which is a range between zero and one.
主站蜘蛛池模板: 疏勒县| 嘉荫县| 昔阳县| 新津县| 伊吾县| 广丰县| 涞源县| 广汉市| 浙江省| 威宁| 腾冲县| 奈曼旗| 江永县| 稷山县| 赣州市| 错那县| 崇义县| 灵山县| 邢台市| 宝应县| 柳州市| 城步| 枣庄市| 云南省| 水富县| 沙田区| 高碑店市| 平湖市| 江津市| 阿合奇县| 泗水县| 山东省| 黄陵县| 达孜县| 武义县| 宁晋县| 庆城县| 叶城县| 泽库县| 肃南| 博白县|