官术网_书友最值得收藏!

  • Hands-On Data Science with R
  • Vitor Bianchi Lanzetta Nataraj Dasgupta Ricardo Anjoleto Farias
  • 166字
  • 2021-06-10 19:12:31

Measures of dispersion

While measures of central tendency try to give an idea about where data is centered, measures of dispersion are meant to give a general idea about how data is distributed around the center. Standard deviation and variance are the most popular measures of dispersion. The square root of the variance equals the standard deviation. It's very easy to get both values with R:

sd(big_sample, na.rm = T)
# outputs [1] 5.01836
var(big_sample, na.rm = T)
# outputs [1] 25.18394
Keep in mind that these computations we've done so far are estimations from the (real) parameters, not parameters itself.

The sd() function estimates the standard deviation while var() estimates the variation. In most cases, we find ourselves with a DataFrame full of variables we want to analyze. One way out of this is to use a function that will quickly summarize the whole dataset. These functions usually work equally well both with vectors and DataFrame objects. The next section introduces a couple of them.

主站蜘蛛池模板: 松阳县| 禄丰县| 平南县| 吉安市| 南丰县| 咸阳市| 彭水| 林周县| 会同县| 阜南县| 健康| 福鼎市| 马山县| 武强县| 丘北县| 内江市| 福建省| 峨山| 高尔夫| 丽水市| 宣威市| 甘德县| 宁安市| 大埔区| 乌鲁木齐市| 晋城| 鄂托克旗| 观塘区| 丰城市| 怀仁县| 和政县| 张家界市| 四会市| 正定县| 高陵县| 云安县| 屏南县| 高尔夫| 郸城县| 兴海县| 韶关市|