官术网_书友最值得收藏!

Measures of central tendency

What if you had to describe the center of a distribution within a single number? Most people would appeal to one of these three estimators: mean, median, or mode. Those are probably the most popular measures of central tendency. Let's begin by sampling data from an arbitrary distribution. Get into your R console and try the following code:

set.seed(10)
small_sample <- rnorm(n = 10, mean = 10, sd = 5)
big_sample <- rnorm(n = 10^5, mean = 10, sd = 5)

The first line is setting the seed number to work with our random number generator (RNG). Every time there's a need to rely on a pseudo-random process, the set.seed() function will make sure your code is reproducible (at least at some level). By setting it to 10 you will get the same numbers that I'm getting from the preceding code lines.

Some people would advise you to set a new seed (with set.seed()) every single time you load a package.

The two last lines are sorting pseudo-random numbers from a normally distributed variable. Call the rnom() function to sort variables from a normal distribution. Choose the number of observations sorted by adjusting the n parameter. Modify the mean and sd parameters if you want a mean and standard deviation different from 0 and 1 respectively.

In the real world, you will hardly know for sure what underlying process is ruling your data, but here we do know beforehand that our numbers come from a normally distributed variable with a mean of 10 and a standard deviation of 5 units. We gathered two samples. The one called small_sample has only 10 observations, while big_sample sums up to 100,000 observations. Even though both come from similar distributions we will see how estimates behave with respect to sample sizes.

主站蜘蛛池模板: 松溪县| 年辖:市辖区| 阿鲁科尔沁旗| 西华县| 丁青县| 静宁县| 富川| 南城县| 刚察县| 霍州市| 砀山县| 辽阳市| 威宁| 武夷山市| 新龙县| 陈巴尔虎旗| 绥阳县| 扎囊县| 阳西县| 同江市| 同仁县| 南溪县| 枞阳县| 高平市| 合川市| 虎林市| 盈江县| 尤溪县| 法库县| 比如县| 宁陕县| 哈巴河县| 慈利县| 兰西县| 江源县| 河西区| 股票| 沅江市| 牙克石市| 台东县| 孝昌县|