官术网_书友最值得收藏!

  • Hands-On Data Science with R
  • Vitor Bianchi Lanzetta Nataraj Dasgupta Ricardo Anjoleto Farias
  • 418字
  • 2021-06-10 19:12:32

Be careful

Previously, in the Running t-tests with R section, we used the bigger sample to run a t-test; but what about the smaller sample? Speaking of R, yes, we can run it. The next code block shows how, but you shouldn't trust a sample so small:

t.test(small_sample, mu = 10, alternative = 'two.sided')

# One Sample t-test
#
# data: small_sample
# t = -2.2169, df = 9, p-value = 0.05384
# alternative hypothesis: true mean is not equal to 10
# 95 percent confidence interval:
# 5.043347 10.050084
# sample estimates:
# mean of x
# 7.546716

The test now came close to rejecting the null hypothesis with a 95% confidence level (and, as we already know, the sample does come from a normal distribution with a mean, μ, equal to 10). This test is not all mighty powerful, I can assure you, but it's more trustworthy if you have lots of observations. Actually, you should be very cautious about making any kind of statistical inference using little data.

You can also test if two samples have the same mean (μ) using the t-test, using t.test().  To do so, name the x and y parameters and at least set var.equal = T to t.test(). The latter will make sure that the variance is considered the same for both samples. This equal variance thing is a necessary assumption for the simple two samples t-test, otherwise you're committing yourself to a Welch's t-test (feel free to do it if you will, it's probably better to a great variety of situations). There is also the possibility to set a custom confidence level by declaring the conf.level argument and to use a different alternative hypothesis with the alternative argument. A quick example can be found  as follows:

t.test(x = small_sample, y = big_sample, var.equal = T)

Let me stress that you shouldn't run it with a sample as small as small_sample. Of course, R will run it, but nonetheless, speaking about the statistical point of view, this is very poor inference because it is based on very poor evidence (a small sample). So, keep in mind, these tests are assuming your data is coming from a normal distribution of unknown standard deviation, and really small samples could be problematic.

One could try something like plot(density(<variable>)) to check whether a variable resembles a normal distribution or not.

But what if you do know the populations' variance?

主站蜘蛛池模板: 嫩江县| 许昌县| 藁城市| 株洲市| 曲水县| 黔西| 永春县| 确山县| 县级市| 酒泉市| 松桃| 婺源县| 水富县| 永川市| 紫金县| 宜宾县| 宝兴县| 奉节县| 平顶山市| 宝应县| 大庆市| 新乐市| 塔城市| 平南县| 镶黄旗| 绵阳市| 都昌县| 本溪市| 荣昌县| 元朗区| 汝南县| 丹巴县| 华阴市| 达拉特旗| 虞城县| 邵阳市| 佛学| 田阳县| 塘沽区| 桂阳县| 遂川县|