官术网_书友最值得收藏!

  • Hands-On Data Science with R
  • Vitor Bianchi Lanzetta Nataraj Dasgupta Ricardo Anjoleto Farias
  • 166字
  • 2021-06-10 19:12:31

Measures of dispersion

While measures of central tendency try to give an idea about where data is centered, measures of dispersion are meant to give a general idea about how data is distributed around the center. Standard deviation and variance are the most popular measures of dispersion. The square root of the variance equals the standard deviation. It's very easy to get both values with R:

sd(big_sample, na.rm = T)
# outputs [1] 5.01836
var(big_sample, na.rm = T)
# outputs [1] 25.18394
Keep in mind that these computations we've done so far are estimations from the (real) parameters, not parameters itself.

The sd() function estimates the standard deviation while var() estimates the variation. In most cases, we find ourselves with a DataFrame full of variables we want to analyze. One way out of this is to use a function that will quickly summarize the whole dataset. These functions usually work equally well both with vectors and DataFrame objects. The next section introduces a couple of them.

主站蜘蛛池模板: 大悟县| 宜宾市| 福贡县| 龙南县| 德阳市| 乃东县| 皋兰县| 长沙县| 扎兰屯市| 五原县| 乡宁县| 葵青区| 绥阳县| 德保县| 永丰县| 嘉黎县| 桐庐县| 苏尼特左旗| 芒康县| 壶关县| 开平市| 无极县| 九龙县| 五家渠市| 韶山市| 景泰县| 舟山市| 水富县| 七台河市| 达尔| 泰来县| 昭觉县| 维西| 博野县| 夏邑县| 巴东县| 内江市| 敦煌市| 平原县| 株洲市| 前郭尔|