
Martijn Wieling and Defne Abur
University of Groningen
R
: parameter alternative="two-tailed"
yields \(p\)-value for two-tailed test
alternative="less"
(or "greater"
): \(p\)-value for one-tailed testpnorm()
or pt()
)(counts <- table(llsNL$sex, llsNL$alcohol))
#
# Alcohol No-alcohol
# Female 21 31
# Male 26 13
prop.table(counts) # as proportions
#
# Alcohol No-alcohol
# Female 0.23077 0.34066
# Male 0.28571 0.14286
prop.table(counts, 1) # proportions per row
#
# Alcohol No-alcohol
# Female 0.40385 0.59615
# Male 0.66667 0.33333
prop.table(counts, 2) # proportions per column
#
# Alcohol No-alcohol
# Female 0.44681 0.70455
# Male 0.55319 0.29545
barplot(counts, legend = c("F", "M"), beside = T, ylim = c(0, 60), main = "Beside")
barplot(counts, legend = c("F", "M"), ylim = c(0, 60), main = "Segmented")
Alcohol | No-alcohol | Total | |
---|---|---|---|
Female | 21 | 31 | 52 |
Male | 26 | 13 | 39 |
Total | 47 | 44 | 91 |
Alcohol | No-alcohol | Proportion | |
---|---|---|---|
Female | 0.57 * 47 = 26.9 | 0.57 * 44 = 25.1 | 0.57 |
Male | 0.43 * 47 = 20.1 | 0.43 * 44 = 18.9 | 0.43 |
Total | 47 | 44 |
$$\chi^2 = \Sigma \frac{(O - E)^2}{E}$$
Alcohol | No-alcohol | |
---|---|---|
Female | $(21-26.9)^2 / 26.9 \approx 1.28$ | $(31-25.1)^2 / 25.1 \approx 1.36$ |
Male | $(26-20.1)^2 / 20.1 \approx 1.7$ | $(13-18.9)^2 / 18.9 \approx 1.82$ |
chisq.test(counts, correct = FALSE) # chi-square test, no continuity correction
#
# Pearson's Chi-squared test
#
# data: counts
# X-squared = 6.16, df = 1, p-value = 0.013
correct = FALSE
, as using Yates’ correction is not recommended (lowers dF: too conservative)library(vcd) # install with: install.packages('vcd')
assocstats(counts)
# X^2 df P(> X^2)
# Likelihood Ratio 6.2536 1 0.012394
# Pearson 6.1642 1 0.013036
#
# Phi-Coefficient : 0.26
# Contingency Coeff.: 0.252
# Cramer's V : 0.26
(counts <- table(llsNL$sex, llsNL$drugs)) # actual frequencies
#
# No Soft Hard
# Female 49 1 2
# Male 37 1 1
chisq.test(counts)$expected # expected frequencies
#
# No Soft Hard
# Female 49.143 1.14286 1.7143
# Male 36.857 0.85714 1.2857
chisq.test(counts, correct = F)
# Warning in chisq.test(counts, correct = F): Chi-squared approximation may be
# incorrect
#
# Pearson's Chi-squared test
#
# data: counts
# X-squared = 0.154, df = 2, p-value = 0.93
fisher.test(counts)
#
# Fisher's Exact Test for Count Data
#
# data: counts
# p-value = 1
# alternative hypothesis: two.sided
assocstats(counts)
# X^2 df P(> X^2)
# Likelihood Ratio 0.15618 2 0.92488
# Pearson 0.15375 2 0.92601
#
# Phi-Coefficient : NA
# Contingency Coeff.: 0.041
# Cramer's V : 0.041
(counts <- table(llsNL$sex, llsNL$edu))
#
# PVMBO HBO WO
# Female 10 9 33
# Male 10 14 15
round(chisq.test(counts)$expected)
#
# PVMBO HBO WO
# Female 11 13 27
# Male 9 10 21
barplot(counts, legend = c("F", "M"), col = c("pink", "lightblue"), ylim = c(0, 70))
chisq.test(counts, correct = F)
#
# Pearson's Chi-squared test
#
# data: counts
# X-squared = 6.1, df = 2, p-value = 0.047
assocstats(counts)
# X^2 df P(> X^2)
# Likelihood Ratio 6.1500 2 0.046189
# Pearson 6.1044 2 0.047255
#
# Phi-Coefficient : NA
# Contingency Coeff.: 0.251
# Cramer's V : 0.259
chisq.test(counts)$expected # expected frequencies
#
# PVMBO HBO WO
# Female 11.4286 13.1429 27.429
# Male 8.5714 9.8571 20.571
chisq.test(counts)$observed # observed frequencies
#
# PVMBO HBO WO
# Female 10 9 33
# Male 10 14 15
prop.table(counts, 2) # column proportions (2nd index)
#
# PVMBO HBO WO
# Female 0.50 0.39 0.69
# Male 0.50 0.61 0.31
prop.table(counts, 1) # row proportions (1st index)
#
# PVMBO HBO WO
# Female 0.19 0.17 0.63
# Male 0.26 0.36 0.38
chisq.test(counts)$stdres # z-scored differences
#
# PVMBO HBO WO
# Female -0.73078 -2.01934 2.36396
# Male 0.73078 2.01934 -2.36396
We tested whether sex (two levels: male and female) and education level (three levels: PVMBO, HBO and WO) were related (with \(\alpha\) = 0.05). Our hypotheses were \(H_0\): sex and education level are independent, and \(H_a\): sex and education level are dependent. In our sample of 91 participants from the Lowlands Science experiment, we obtained the participants’ sex and education level. Table 1 shows their relation and Figure 1 shows a bar plot visualizing the results of the table. As both variables are nominal, and the expected frequency of each combination was greater than 5, we assessed the dependence of the two variables using the \(\chi^2\) test: \(\chi^2(2) = 6.1\), \(p = 0.047\). As the \(p\)-value was lower than the \(\alpha\)-level, the result was significant. We reject the null hypothesis and accept the alternative hypothesis that sex and education level are dependent. The difference from the expected distribution was greatest for the WO group: there were many more females than expected. The effect size was small (Cramer’s \(V\) = 0.26).
Hospital A | Hospital B | |
---|---|---|
Deceased | 63 (3%) | 16 (2%) |
Survived | 2037 | 784 |
Total | 2100 | 800 |
Hosp. A (ill) | Hosp. B (ill) | Hosp. A (very ill) | Hosp. B (very ill) | |
---|---|---|---|---|
Deceased | 6 (1%) | 8 (1.3%) | 57 (3.8%) | 8 (4%) |
Survived | 594 | 592 | 1433 | 192 |
Total | 600 | 600 | 1500 | 200 |
plot(llsNL$BAC, llsNL$rating)
sum( scale(llsNL$BAC) * scale(llsNL$rating) ) / ( nrow(llsNL) - 1 ) # manual
# [1] -0.34045
cor(llsNL$BAC, llsNL$rating) # using the cor function (easier)
# [1] -0.34045
cor(llsNL$BAC, llsNL$rating) # when speaking Dutch
# [1] -0.34045
cor(llsEN$BAC, llsEN$rating) # when speaking English
# [1] -0.11235
use
needs to be set to "pairwise.complete.obs"
llsNL[1, ]$rating = NA # set 1 value to NA (only for illustration)
cor(llsNL$BAC, llsNL$rating) # will result in NA
# [1] NA
cor(llsNL$BAC, llsNL$rating, use = "pairwise.complete.obs")
# [1] -0.36322
cor.test()
conducts this test
load("sats.rda")
head(sats) # scores 1-7: negative - positive (some inverted)
# Q3 Q4 Q14 Q15 Q18 Q19 Q28
# 1 3 4 7 3 5 3 5
# 2 5 5 7 2 6 4 4
# 3 6 2 5 1 1 5 2
# 4 4 4 7 1 3 3 4
# 5 6 6 7 5 3 6 5
# 6 6 5 7 2 3 5 5
library(psych)
result <- alpha(sats)
# Some items ( Q3 Q19 ) were negatively correlated with the total scale and
# probably should be reversed.
# To do this, run the function again with the 'check.keys=TRUE' option
summary(result)
#
# Reliability analysis
# raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
# 0.41 0.37 0.63 0.078 0.59 0.039 4.2 0.69 0.042
result <- alpha(sats, keys = c("Q4", "Q15", "Q18", "Q28")) # keys: scales inverted
summary(result) # much better reliability
#
# Reliability analysis
# raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
# 0.79 0.75 0.81 0.3 3 0.013 4.3 0.98 0.36
result$alpha.drop
# raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
# Q3 0.74 0.69 0.73 0.27 2.18 0.02 0.10 0.36
# Q4- 0.74 0.70 0.76 0.28 2.29 0.02 0.09 0.37
# Q14 0.84 0.84 0.85 0.46 5.10 0.01 0.02 0.38
# Q15- 0.77 0.73 0.80 0.31 2.67 0.01 0.12 0.37
# Q18- 0.73 0.69 0.75 0.27 2.18 0.02 0.09 0.35
# Q19 0.75 0.69 0.73 0.27 2.21 0.02 0.10 0.35
# Q28- 0.71 0.67 0.73 0.26 2.06 0.02 0.08 0.35
sats2 <- sats[, -3] # drop third column (Q14)
result2 <- alpha(sats2, keys = c("Q4", "Q15", "Q18", "Q28")) # keys: scales inverted
summary(result2)
#
# Reliability analysis
# raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
# 0.84 0.84 0.85 0.46 5.1 0.011 4 1.2 0.38