書名： Hands-On Data Science with R
作者名： Vitor Bianchi Lanzetta Nataraj Dasgupta Ricardo Anjoleto Farias
本章字數： 418字
更新時間： 2021-06-10 19:12:32

Be careful

Previously, in the Running t-tests with R section, we used the bigger sample to run a t-test; but what about the smaller sample? Speaking of R, yes, we can run it. The next code block shows how, but you shouldn't trust a sample so small:

t.test(small_sample, mu = 10, alternative = 'two.sided')

#   One Sample t-test
# 
# data: small_sample
# t = -2.2169, df = 9, p-value = 0.05384
# alternative hypothesis: true mean is not equal to 10
# 95 percent confidence interval:
#   5.043347 10.050084
# sample estimates:
# mean of x 
#  7.546716

The test now came close to rejecting the null hypothesis with a 95% confidence level (and, as we already know, the sample does come from a normal distribution with a mean, μ, equal to 10). This test is not all mighty powerful, I can assure you, but it's more trustworthy if you have lots of observations. Actually, you should be very cautious about making any kind of statistical inference using little data.

You can also test if two samples have the same mean (μ) using the t-test, using t.test(). To do so, name the x and y parameters and at least set var.equal = T to t.test(). The latter will make sure that the variance is considered the same for both samples. This equal variance thing is a necessary assumption for the simple two samples t-test, otherwise you're committing yourself to a Welch's t-test (feel free to do it if you will, it's probably better to a great variety of situations). There is also the possibility to set a custom confidence level by declaring the conf.level argument and to use a different alternative hypothesis with the alternative argument. A quick example can be found as follows:

t.test(x = small_sample, y = big_sample, var.equal = T)

Let me stress that you shouldn't run it with a sample as small as small_sample. Of course, R will run it, but nonetheless, speaking about the statistical point of view, this is very poor inference because it is based on very poor evidence (a small sample). So, keep in mind, these tests are assuming your data is coming from a normal distribution of unknown standard deviation, and really small samples could be problematic.

One could try something like plot(density(<variable>)) to check whether a variable resembles a normal distribution or not.

But what if you do know the populations' variance?

官术网_书友最值得收藏!

Hands-On Data Science with R

Be careful