官术网_书友最值得收藏!

Measuring variance

We usually refer to variance as sigma squared, and you'll find out why momentarily, but for now, just know that variance is the average of the squared differences from the mean.

  1. To compute the variance of a dataset, you first figure out the mean of it. Let's say I have some data that could represent anything. Let's say maximum number of people that were standing in line for a given hour. In the first hour, I observed 1 person standing in line, then 4, then 5, then 4, then 8.
  2. The first step in computing the variance is just to find the mean, or the average, of that data. I add them all, divide the sum by the number of data points, and that comes out to 4.4 which is the average number of people standing in line (1+4+5+4+8)/5 = 4.4.
  3. Now the next step is to find the differences from the mean for each data point. I know that the mean is 4.4. So for my first data point, I have 1, so 1 - 4.4 = -3.4, The next data point is 4, so 4 - 4.4 = -0.4 4 - 4.4 = -0.4, and so on and so forth. OK, so I end up with these both positive and negative numbers that represent the variance from the mean for each data point (-3.4, -0.4, 0.6, -0.4, 3.6).
  4. Now what I need is a single number that represents the variance of this entire dataset. So, the next thing I'm going to do is find the square of these differences. I'm just going to go through each one of those raw differences from the mean and square them. This is for a couple of different reasons:
    • First, I want to make sure that negative variances. Count just as much as positive variances. Otherwise, they will cancel each other out. That'd be bad.
    • Second, I also want to give more weight to the outliers, so this amplifies the effect of things that are very different from the mean while still, making sure that the negatives and positives are comparable (11.56, 0.16, 0.36, 0.16, 12.96).

Let's look at what happens there, so (-3.4)2 is a positive 11.56 and (-0.4)2 ends up being a much smaller number, that is 0.16, because that's much closer to the mean of 4.4. Also (0.6)2 turned out to be close to the mean, only 0.36. But as we get up to the positive outlier, (3.6)2 ends up being 12.96. That gives us: (11.56, 0.16, 0.36, 0.16, 12.96).

To find the actual variance value, we just take the average of all those squared differences. So we add up all these squared variances, divide the sum by 5, that is number of values that we have, and we end up with a variance of 5.04.

OK, that's all variance is.

主站蜘蛛池模板: 和田县| 金门县| 蕲春县| 得荣县| 盐津县| 茌平县| 根河市| 当雄县| 清丰县| 陈巴尔虎旗| 竹溪县| 东乡县| 黄骅市| 禄丰县| 莲花县| 天水市| 清镇市| 泊头市| 古浪县| 内丘县| 宣城市| 平舆县| 南开区| 普兰店市| 蓬溪县| 浑源县| 太湖县| 平凉市| 三台县| 三穗县| 治多县| 喀什市| 阜新| 鹿邑县| 阆中市| 高州市| 冷水江市| 玛沁县| 会宁县| 奉节县| 大悟县|