- Machine Learning for Developers
- Rodolfo Bonnin
- 237字
- 2021-07-02 15:46:46
Variance
As we saw in the first example, the mean isn't sufficient to describe non-homogeneous or very dispersed samples.
In order to add a unique value describing how dispersed the sample set's values are, we need to look at the concept of variance, which needs the mean of the sample set as a starting point, and then averages the distances of the samples from the provided mean. The greater the variance, the more scattered the sample set.
The canonical definition of variance is as follows:

Let's write the following sample code snippet to illustrate this concept, adopting the previously used libraries. For the sake of clarity, we are repeating the declaration of the mean function:
import math #This library is needed for the power operation
def mean(sampleset): #Definition header for the mean function
total=0
for element in sampleset:
total=total+element
return total/len(sampleset)
def variance(sampleset): #Definition header for the mean function
total=0
setmean=mean(sampleset)
for element in sampleset:
total=total+(math.pow(element-setmean,2))
return total/len(sampleset)
myset1=[2.,10.,3.,6.,4.,6.,10.] #We create the data set
myset2=[1.,-100.,15.,-100.,21.]
print "Variance of first set:" + str(variance(myset1))
print "Variance of second set:" + str(variance(myset2))
The preceding code will generate the following output:
Variance of first set:8.69387755102
Variance of second set:3070.64
As you can see, the variance of the second set was much higher, given the really dispersed values. The fact that we are computing the mean of the squared distance helps to really outline the differences, as it is a quadratic operation.
- The DevOps 2.3 Toolkit
- Mastering Selenium WebDriver
- 實戰Java程序設計
- TypeScript圖形渲染實戰:基于WebGL的3D架構與實現
- Windows Presentation Foundation Development Cookbook
- 鋒利的SQL(第2版)
- Learning Network Forensics
- Mastering Python Networking
- PySide 6/PyQt 6快速開發與實戰
- SQL Server實用教程(SQL Server 2008版)
- Fast Data Processing with Spark(Second Edition)
- Scratch趣味編程:陪孩子像搭積木一樣學編程
- 軟件測試綜合技術
- Learning Splunk Web Framework
- Spark技術內幕:深入解析Spark內核架構設計與實現原理