官术网_书友最值得收藏!

Scaling

Values of different features can differ by orders of magnitude. Sometimes, this may mean that the larger values dominate the smaller values. This depends on the algorithm we're using. For certain algorithms to work properly, we're required to scale the data.

There are following several common strategies that we can apply:

  • Standardization removes the mean of a feature and divides by the standard deviation. If the feature values are normally distributed, we'll get a Gaussian, which is centered around zero with a variance of one.
  • If the feature values aren't normally distributed, we can remove the median and divide by the interquartile range. The interquartile range is a range between the first and third quartile (or 25th and 75th percentile).
  • Scaling features to a range is a common choice of range between zero and one.
主站蜘蛛池模板: 弥勒县| 揭西县| 丹巴县| 唐河县| 会理县| 布尔津县| 宁都县| 博白县| 钦州市| 自治县| 察隅县| 垫江县| 三台县| 渭南市| 芦山县| 武平县| 葫芦岛市| 大同市| 迁西县| 阿尔山市| 正蓝旗| 武安市| 红原县| 肥东县| 三河市| 祁东县| 玛纳斯县| 石河子市| 措勤县| 石狮市| 揭阳市| 宣化县| 增城市| 迁西县| 云梦县| 商南县| 广宗县| 瑞金市| 监利县| 商南县| 武山县|