官术网_书友最值得收藏!

  • Hands-On Data Science with R
  • Vitor Bianchi Lanzetta Nataraj Dasgupta Ricardo Anjoleto Farias
  • 166字
  • 2021-06-10 19:12:31

Measures of dispersion

While measures of central tendency try to give an idea about where data is centered, measures of dispersion are meant to give a general idea about how data is distributed around the center. Standard deviation and variance are the most popular measures of dispersion. The square root of the variance equals the standard deviation. It's very easy to get both values with R:

sd(big_sample, na.rm = T)
# outputs [1] 5.01836
var(big_sample, na.rm = T)
# outputs [1] 25.18394
Keep in mind that these computations we've done so far are estimations from the (real) parameters, not parameters itself.

The sd() function estimates the standard deviation while var() estimates the variation. In most cases, we find ourselves with a DataFrame full of variables we want to analyze. One way out of this is to use a function that will quickly summarize the whole dataset. These functions usually work equally well both with vectors and DataFrame objects. The next section introduces a couple of them.

主站蜘蛛池模板: 怀安县| 广西| 二连浩特市| 丰宁| 牙克石市| 鄄城县| 公主岭市| 普格县| 聊城市| 雅安市| 亳州市| 宁夏| 建始县| 桐城市| 伊宁市| 邻水| 万荣县| 筠连县| 远安县| 临潭县| 墨脱县| 贵德县| 高密市| 天台县| 东港市| 饶阳县| 长丰县| 弋阳县| 米脂县| 德化县| 乌鲁木齐县| 常熟市| 台中县| 荣昌县| 华蓥市| 彭水| 平舆县| 象山县| 宜州市| 高尔夫| 迁安市|