官术网_书友最值得收藏!

Be careful

Previously, in the Running t-tests with R section, we used the bigger sample to run a t-test; but what about the smaller sample? Speaking of R, yes, we can run it. The next code block shows how, but you shouldn't trust a sample so small:

t.test(small_sample, mu = 10, alternative = 'two.sided')

# One Sample t-test
#
# data: small_sample
# t = -2.2169, df = 9, p-value = 0.05384
# alternative hypothesis: true mean is not equal to 10
# 95 percent confidence interval:
# 5.043347 10.050084
# sample estimates:
# mean of x
# 7.546716

The test now came close to rejecting the null hypothesis with a 95% confidence level (and, as we already know, the sample does come from a normal distribution with a mean, μ, equal to 10). This test is not all mighty powerful, I can assure you, but it's more trustworthy if you have lots of observations. Actually, you should be very cautious about making any kind of statistical inference using little data.

You can also test if two samples have the same mean (μ) using the t-test, using t.test().  To do so, name the x and y parameters and at least set var.equal = T to t.test(). The latter will make sure that the variance is considered the same for both samples. This equal variance thing is a necessary assumption for the simple two samples t-test, otherwise you're committing yourself to a Welch's t-test (feel free to do it if you will, it's probably better to a great variety of situations). There is also the possibility to set a custom confidence level by declaring the conf.level argument and to use a different alternative hypothesis with the alternative argument. A quick example can be found  as follows:

t.test(x = small_sample, y = big_sample, var.equal = T)

Let me stress that you shouldn't run it with a sample as small as small_sample. Of course, R will run it, but nonetheless, speaking about the statistical point of view, this is very poor inference because it is based on very poor evidence (a small sample). So, keep in mind, these tests are assuming your data is coming from a normal distribution of unknown standard deviation, and really small samples could be problematic.

One could try something like plot(density(<variable>)) to check whether a variable resembles a normal distribution or not.

But what if you do know the populations' variance?

主站蜘蛛池模板: 资溪县| 海盐县| 右玉县| 满城县| 南乐县| 贡觉县| 宁国市| 蒙山县| 延边| 东阳市| 汕尾市| 吉首市| 思茅市| 那曲县| 监利县| 清镇市| 灵璧县| 溧阳市| 贡山| 祁连县| 曲靖市| 武义县| 和政县| 兰考县| 五寨县| 延边| 新乡县| 时尚| 威远县| 台南县| 侯马市| 安多县| 兴安盟| 华亭县| 富顺县| 莆田市| 阳新县| 双辽市| 临沂市| 栾川县| 乳源|