官术网_书友最值得收藏!

Variability

If we would like to understand the variability of the data, there are three important measures that we can use to get a better understanding of our data set, and those include the following:

  • Variance
  • Standard deviation
  • Coefficient of variation

Variance measures the distance of the data points around their mean value. Variance has a central role in statistics, where some ideas that use it include descriptive statistics, statistical inference, hypothesis testing, goodness of fit, and Monte Carlo sampling. Variance is an important tool in the sciences, where statistical analysis of data is common. The variance is the square of the standard deviation, the second central moment of a distribution, and the covariance of the random variable with itself.

Standard deviation is a measure of how spread-out numbers are, and based on the number describing the standard deviation, you can see the extent to which a set of numbers lies apart. In addition to expressing the variability of a population, the standard deviation is commonly used to measure confidence in statistical conclusions. Unlike variance, the standard deviation is expressed in the same units as the data, which is very useful.

The coefficient of variation is a measure of the dispersion of probability, or a frequency distribution. It's usually a percentage value, which is defined as a ratio of the standard deviation to the mean.

SQL Server has several built-in functions that can be used to calculate variance and standard deviation. To find out the variability measures from our table, we can run the following query:

SELECT AVG(datapoint),VARP(datapoint), STDEVP(datapoint) FROM dataset

Don't get confused with those function names, as for variance there are are two functions VAR() and VARP() and the same for the standard deviation, STDEV() and STDEVP(). This is very important from a statistics perspective, depending on whether we work with the whole population or just a sample of the population.

Variance for sample and population are not the same, as if you would check the formulas behind these functions have different denominators:

The full query to get the variability measures on our table, which would include both population and sample measures, would look such as this one:

SELECT AVG(datapoint) as Mean,
VAR(datapoint) as 'Sample variation',
VARP(datapoint) as 'Population variation',
STDEV(datapoint) as 'Sample standard deviation',
STDEVP(datapoint) as 'Population standard deviation'
FROM dataset

In such cases, the sample variability measures will be higher than the population measures, due to a lower denominator value. In the case of a sample, where you don't have the whole population data points, the concentrations of extremes and central tendencies can be different based on the whole range of values. When using such functions, you need to know whether you're working with just a sample of the data or the whole population of the data available, so the results are not skewed and you have the proper outcomes of these functions!

主站蜘蛛池模板: 余干县| 武冈市| 宁化县| 珲春市| 高青县| 拜城县| 丰镇市| 南和县| 昌平区| 宕昌县| 鹿泉市| 木兰县| 广昌县| 雷波县| 垣曲县| 邵武市| 兴和县| 阿拉尔市| 泊头市| 郴州市| 城步| 日土县| 乾安县| 贺州市| 隆昌县| 武隆县| 屯留县| 九龙坡区| 德惠市| 和林格尔县| 周宁县| 隆回县| 凤凰县| 泸水县| 巧家县| 临夏市| 五莲县| 浦城县| 榆社县| 南丹县| 泗阳县|