官术网_书友最值得收藏!

Measuring variance

We usually refer to variance as sigma squared, and you'll find out why momentarily, but for now, just know that variance is the average of the squared differences from the mean.

  1. To compute the variance of a dataset, you first figure out the mean of it. Let's say I have some data that could represent anything. Let's say maximum number of people that were standing in line for a given hour. In the first hour, I observed 1 person standing in line, then 4, then 5, then 4, then 8.
  2. The first step in computing the variance is just to find the mean, or the average, of that data. I add them all, divide the sum by the number of data points, and that comes out to 4.4 which is the average number of people standing in line (1+4+5+4+8)/5 = 4.4.
  3. Now the next step is to find the differences from the mean for each data point. I know that the mean is 4.4. So for my first data point, I have 1, so 1 - 4.4 = -3.4, The next data point is 4, so 4 - 4.4 = -0.4 4 - 4.4 = -0.4, and so on and so forth. OK, so I end up with these both positive and negative numbers that represent the variance from the mean for each data point (-3.4, -0.4, 0.6, -0.4, 3.6).
  4. Now what I need is a single number that represents the variance of this entire dataset. So, the next thing I'm going to do is find the square of these differences. I'm just going to go through each one of those raw differences from the mean and square them. This is for a couple of different reasons:
    • First, I want to make sure that negative variances. Count just as much as positive variances. Otherwise, they will cancel each other out. That'd be bad.
    • Second, I also want to give more weight to the outliers, so this amplifies the effect of things that are very different from the mean while still, making sure that the negatives and positives are comparable (11.56, 0.16, 0.36, 0.16, 12.96).

Let's look at what happens there, so (-3.4)2 is a positive 11.56 and (-0.4)2 ends up being a much smaller number, that is 0.16, because that's much closer to the mean of 4.4. Also (0.6)2 turned out to be close to the mean, only 0.36. But as we get up to the positive outlier, (3.6)2 ends up being 12.96. That gives us: (11.56, 0.16, 0.36, 0.16, 12.96).

To find the actual variance value, we just take the average of all those squared differences. So we add up all these squared variances, divide the sum by 5, that is number of values that we have, and we end up with a variance of 5.04.

OK, that's all variance is.

主站蜘蛛池模板: 织金县| 赤壁市| 治多县| 富民县| 鞍山市| 揭阳市| 芮城县| 寻乌县| 茂名市| 武定县| 柯坪县| 南和县| 密山市| 瑞金市| 曲周县| 兴和县| 突泉县| 托克逊县| 嘉祥县| 哈巴河县| 海原县| 筠连县| 旺苍县| 塔河县| 青河县| 咸宁市| 土默特右旗| 当涂县| 方正县| 黄大仙区| 资阳市| 岢岚县| 湛江市| 讷河市| 威信县| 通城县| 旬邑县| 集安市| 大英县| 秦皇岛市| 封开县|