官术网_书友最值得收藏!

Identifying outliers with standard deviation

Here's a histogram of the actual data we were looking at in the preceding example for calculating variance.

Now we see that the number 4 occurred twice in our dataset, and then we had one 1, one 5, and one 8.

The standard deviation is usually used as a way to think about how to identify outliers in your dataset. If I say if I'm within one standard deviation of the mean of 4.4, that's considered to be kind of a typical value in a normal distribution. However, you can see in the preceding diagram, that the numbers 1 and 8 actually lie outside of that range. So if I take 4.4 plus or minus 2.24, we end up around 7 and 2, and 1 and 8 both fall outside of that range of a standard deviation. So we can say mathematically, that 1 and 8 are outliers. We don't have to guess and eyeball it. Now there is still a judgment call as to what you consider an outlier in terms of how many standard deviations a data point is from the mean.

You can generally talk about how much of an outlier a data point is by how many standard deviations (or sometimes how many-sigmas) from the mean it is.

So that's something you'll see standard deviation used for in the real world.

主站蜘蛛池模板: 绥化市| 大城县| 桂平市| 周至县| 江安县| 庄河市| 东乡族自治县| 福贡县| 临颍县| 山阳县| 宿松县| 乡宁县| 云和县| 闽清县| 衡阳市| 崇仁县| 梁河县| 丰都县| 长武县| 三明市| 建湖县| 阿鲁科尔沁旗| 桑日县| 合阳县| 石棉县| 西和县| 南郑县| 永新县| 灌南县| 宜兰市| 滨州市| 玉山县| 新竹县| 新兴县| 牡丹江市| 郎溪县| 石家庄市| 昆明市| 德安县| 准格尔旗| 库车县|