官术网_书友最值得收藏!

Variance

As we saw in the first example, the mean isn't sufficient to describe non-homogeneous or very dispersed samples.

In order to add a unique value describing how dispersed the sample set's values are, we need to look at the concept of variance, which needs the mean of the sample set as a starting point, and then averages the distances of the samples from the provided mean. The greater the variance, the more scattered the sample set.

The canonical definition of variance is as follows:

Let's write the following sample code snippet to illustrate this concept, adopting the previously used libraries. For the sake of clarity, we are repeating the declaration of the mean function:

    import math #This library is needed for the power operation 
def mean(sampleset): #Definition header for the mean function
total=0
for element in sampleset:
total=total+element
return total/len(sampleset)

def variance(sampleset): #Definition header for the mean function
total=0
setmean=mean(sampleset)
for element in sampleset:
total=total+(math.pow(element-setmean,2))
return total/len(sampleset)

myset1=[2.,10.,3.,6.,4.,6.,10.] #We create the data set
myset2=[1.,-100.,15.,-100.,21.]
print "Variance of first set:" + str(variance(myset1))
print "Variance of second set:" + str(variance(myset2))

The preceding code will generate the following output:

    Variance of first set:8.69387755102
Variance of second set:3070.64

As you can see, the variance of the second set was much higher, given the really dispersed values. The fact that we are computing the mean of the squared distance helps to really outline the differences, as it is a quadratic operation.

主站蜘蛛池模板: 分宜县| 南安市| 汽车| 灌阳县| 百色市| 宁德市| 鄯善县| 沾化县| 南郑县| 邓州市| 南皮县| 古蔺县| 齐齐哈尔市| 西贡区| 三门峡市| 郸城县| 遂宁市| 和硕县| 卢湾区| 赤壁市| 亚东县| 韩城市| 怀柔区| 邹城市| 剑川县| 苏尼特右旗| 阳春市| 萨迦县| 鄂尔多斯市| 龙川县| 嘉鱼县| 灵璧县| 洪江市| 汉川市| 陇川县| 许昌县| 泰来县| 原平市| 屏东县| 岳普湖县| 天台县|