官术网_书友最值得收藏!

Scaling

Values of different features can differ by orders of magnitude. Sometimes, this may mean that the larger values dominate the smaller values. This depends on the algorithm we're using. For certain algorithms to work properly, we're required to scale the data.

There are following several common strategies that we can apply:

  • Standardization removes the mean of a feature and divides by the standard deviation. If the feature values are normally distributed, we'll get a Gaussian, which is centered around zero with a variance of one.
  • If the feature values aren't normally distributed, we can remove the median and divide by the interquartile range. The interquartile range is a range between the first and third quartile (or 25th and 75th percentile).
  • Scaling features to a range is a common choice of range between zero and one.
主站蜘蛛池模板: 兴隆县| 天长市| 涟水县| 行唐县| 筠连县| 台安县| 龙南县| 离岛区| 壶关县| 南部县| 轮台县| 海晏县| 深圳市| 台前县| 高要市| 武城县| 农安县| 如皋市| 五常市| 高碑店市| 虹口区| 阜南县| 迁西县| 青岛市| 东辽县| 东宁县| 深水埗区| 全州县| 元氏县| 山阴县| 霍林郭勒市| 台中市| 大港区| 翁源县| 宜良县| 南京市| 保康县| 安龙县| 冕宁县| 太谷县| 广灵县|