Martijn Wieling
University of Groningen
$$t = \frac{m - \mu}{s / \sqrt{n}} \hspace{70pt} z = \frac{m - \mu}{\sigma / \sqrt{n}}$$
qt(0.025, df = 10, lower.tail = F) # crit. t-value (alpha = 0.025) for dF = 10
# [1] 2.2281
pt(2, 10, lower.tail = F) * 2 # two-sided p-value = 2 * one-sided p-value
# [1] 0.073388
$$t = \frac{m - \mu}{s / \sqrt{n}}$$
R
using function t.test()
boxplot(dat$english_score)
abline(h = 7.5, lty = 2, lwd = 2)
$$t = \frac{m - \mu}{s / \sqrt{n}}$$
R
to calculate the \(t\)-value automatically: but you also need to be able to calculate the \(t\)-value manually at your exam (but with simple values)!t.test(dat$english_score, alternative = "two.sided", mu = 7.5)
#
# One Sample t-test
#
# data: dat$english_score
# t = 2.86, df = 499, p-value = 0.0045
# alternative hypothesis: true mean is not equal to 7.5
# 95 percent confidence interval:
# 7.5368 7.6988
# sample estimates:
# mean of x
# 7.6178
difference (in \(s\)) | \(n\) | \(p\) |
---|---|---|
0.01 | 40,000 | 0.05 |
0.10 | 400 | 0.05 |
0.25 | 64 | 0.05 |
0.38 | 30 | 0.05 |
0.54 | 16 | 0.05 |
$$t = \frac{m_1 - m_2}{s_p \cdot \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}$$
t.test()
includes Welch’s adjustment to correct for unequal variances (more conservative: degrees of freedom reduced)dat$bl_edu = relevel(dat$bl_edu,'Y') # make 'Y' first level (default is 'N')
boxplot(english_score ~ bl_edu, data=dat) # formula notation is easy to use
# or: boxplot(dat[dat$bl_edu=='Y',]$english_score, dat[dat$bl_edu=='N',]$english_score)
t.test(english_score ~ bl_edu, data = dat, alternative = "greater") # 1st > 2nd level?
#
# Welch Two Sample t-test
#
# data: english_score by bl_edu
# t = 3.92, df = 54.3, p-value = 0.00013
# alternative hypothesis: true difference in means between group Y and group N is greater than 0
# 95 percent confidence interval:
# 0.33529 Inf
# sample estimates:
# mean in group Y mean in group N
# 8.1483 7.5627
cohen.d
library(effsize) # to install: install.packages('effsize')
cohen.d(english_score ~ bl_edu, data = dat)
#
# Cohen's d
#
# d estimate: 0.64563 (medium)
# 95 percent confidence interval:
# lower upper
# 0.34188 0.94937
boxplot(lls$Diff1, lls$Diff2, names = c("First trial", "Last trial"))
t.test(lls$Diff1, lls$Diff2, paired = TRUE, alternative = "greater")
#
# Paired t-test
#
# data: lls$Diff1 and lls$Diff2
# t = 2.31, df = 73, p-value = 0.012
# alternative hypothesis: true mean difference is greater than 0
# 95 percent confidence interval:
# 3.2997 Inf
# sample estimates:
# mean difference
# 11.87
cohen.d(lls$Diff1, lls$Diff2, paired = T)
#
# Cohen's d
#
# d estimate: 0.19936 (negligible)
# 95 percent confidence interval:
# lower upper
# 0.026919 0.371803
t.test(lls$Diff1, lls$Diff2, paired = FALSE, alternative = "greater")$statistic
# t
# 1.2138
# with the paired t-test:
t.test(lls$Diff1, lls$Diff2, paired = TRUE, alternative = "greater")$statistic
# t
# 2.3074
Simple \(t\) statistic:
$$t = \; \frac{m_1 - m_2}{s/\sqrt{n}}$$
We tested whether the average English scores of students following Statistiek I was significantly higher for those who had bilingual education than for those who did not. Our hypotheses were: $H_0$: \(\mu_b = \mu_m\) and $H_a$: \(\mu_b > \mu_m\). We obtained English scores in a sample of 500 students of the Statistiek I course via an online questionnaire. Since \(\sigma\) is unknown and the samples were independent, we conducted an independent samples \(t\)-test (corrected for unequal variances) after verifying that the assumptions for the test were met (normally distributed, or more than 30 values). The mean of the English scores for the students with bilingual education in the sample was 8.15, whereas it was 7.56 for those who followed monolingual education. The effect size was medium (Cohen’s \(d\): 0.65; see box plot), and it reached significance at $\alpha$-level 0.05: $t$(54.3) = 3.92, \(p\) < 0.001. We therefore reject the null hypothesis and accept the alternative hypothesis that students who had bilingual education had higher English scores than those who did not.
Practice this in laboratory exercises!
datEN$Sound = relevel(datEN$Sound, "TH") # set TH as reference level
t.test(FrontPos ~ Sound, data = datEN, paired = T, alternative = "greater") # paired
#
# Paired t-test
#
# data: FrontPos by Sound
# t = 6.4, df = 21, p-value = 1.2e-06
# alternative hypothesis: true mean difference is greater than 0
# 95 percent confidence interval:
# 0.035207 Inf
# sample estimates:
# mean difference
# 0.048154
cohen.d(FrontPos ~ Sound, data = datEN, paired = T)$estimate # large effect size
# [1] 0.94206
datNL$Sound = relevel(datNL$Sound, "TH")
t.test(FrontPos ~ Sound, data = datNL, paired = T, alternative = "greater")
#
# Paired t-test
#
# data: FrontPos by Sound
# t = 1.86, df = 18, p-value = 0.04
# alternative hypothesis: true mean difference is greater than 0
# 95 percent confidence interval:
# 0.0010611 Inf
# sample estimates:
# mean difference
# 0.016263
cohen.d(FrontPos ~ Sound, data = datNL, paired = T)$estimate # medium effect size
# [1] 0.32382
Thank you for your attention!