Introduction to R
RStudio
and R
R
as calculatorR
R
R
R
participant | year | sex | bl_edu | study | english_grade | english_score |
---|---|---|---|---|---|---|
1 | 2020 | F | N | LING | 6 | 5.19 |
2 | 2020 | M | N | LING | 7 | 6.82 |
3 | 2020 | M | N | LING | 8 | 8.21 |
4 | 2020 | F | N | CIS | 7 | 7.34 |
5 | 2020 | F | N | LING | 7 | 6.59 |
6 | 2020 | F | N | LING | 8 | 7.55 |
7 | 2020 | F | N | LING | 7 | 7.19 |
8 | 2020 | F | Y | LING | 8 | 7.63 |
9 | 2020 | F | N | LING | 6 | 6.58 |
10 | 2020 | M | N | IS | 8 | 8.89 |
11 | 2020 | M | N | CIS | 7 | 6.76 |
participant | year | sex | bl_edu | study | english_grade | english_score |
---|---|---|---|---|---|---|
123 | 2021 | M | N | LING | 5.0 | 6.10 |
124 | 2021 | F | N | CIS | 6.0 | 6.67 |
125 | 2021 | F | N | CIS | 7.0 | 7.42 |
126 | 2021 | F | N | LING | 8.0 | 9.10 |
127 | 2021 | F | N | CIS | 7.0 | 7.47 |
128 | 2021 | M | N | LING | 8.4 | 8.14 |
129 | 2021 | F | N | LING | 8.0 | 7.65 |
130 | 2021 | F | N | CIS | 6.0 | 7.35 |
131 | 2021 | F | N | LING | 8.0 | 8.54 |
132 | 2021 | M | N | IS | 8.0 | 8.39 |
133 | 2021 | F | N | LING | 7.0 | 7.98 |
participant | year | sex | bl_edu | study | english_grade | english_score |
---|---|---|---|---|---|---|
225 | 2022 | M | N | IS | 8 | 7.10 |
226 | 2022 | F | N | OTHER | 9 | 7.76 |
227 | 2022 | F | N | OTHER | 7 | 5.68 |
228 | 2022 | F | N | CIS | 7 | 7.31 |
229 | 2022 | F | N | LING | 7 | 7.95 |
230 | 2022 | M | N | OTHER | 7 | 7.51 |
231 | 2022 | F | N | IS | 7 | 6.97 |
232 | 2022 | F | N | CIS | 6 | 6.22 |
233 | 2022 | M | N | OTHER | 8 | 8.71 |
234 | 2022 | F | N | LING | 7 | 6.78 |
235 | 2022 | F | N | CIS | 6 | 5.94 |
participant | year | sex | bl_edu | study | english_grade | english_score |
---|---|---|---|---|---|---|
320 | 2023 | M | N | LING | 8.0 | 9.02 |
321 | 2023 | F | N | LING | 8.0 | 7.44 |
322 | 2023 | F | N | CIS | 9.0 | 9.74 |
323 | 2023 | F | N | CIS | 7.0 | 9.06 |
324 | 2023 | F | N | CIS | 8.0 | 8.35 |
325 | 2023 | F | N | LING | 7.3 | 8.55 |
326 | 2023 | F | N | CIS | 6.0 | 6.51 |
327 | 2023 | F | N | LING | 7.0 | 7.87 |
328 | 2023 | M | N | CIS | 6.0 | 7.22 |
329 | 2023 | F | N | LING | 7.0 | 7.08 |
330 | 2023 | F | N | OTHER | 8.0 | 8.69 |
participant | year | sex | bl_edu | study | english_grade | english_score |
---|---|---|---|---|---|---|
427 | 2024 | M | N | IS | 8.0 | 7.87 |
428 | 2024 | F | N | OTHER | 8.0 | 8.99 |
429 | 2024 | F | N | OTHER | 7.3 | 7.75 |
430 | 2024 | F | N | OTHER | 7.0 | 8.37 |
431 | 2024 | F | N | LING | 7.0 | 6.93 |
432 | 2024 | M | N | IS | 8.0 | 8.21 |
433 | 2024 | F | N | LING | 7.0 | 8.12 |
434 | 2024 | F | N | LING | 8.0 | 8.28 |
435 | 2024 | F | N | LING | 7.0 | 8.81 |
436 | 2024 | F | N | IS | 5.8 | 5.73 |
437 | 2024 | F | N | LING | 7.1 | 7.31 |
Measures of central tendency and spread
Visualization
R
(this lecture)
R
R
?R
compared to (e.g.,) SPSS
R
)R
as calculatorR
: exporting a data setR
: importing a data set'data.frame': 500 obs. of 7 variables:
$ participant : int 1 2 3 4 5 6 7 8 9 10 ...
$ year : int 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
$ sex : chr "F" "M" "M" "F" ...
$ bl_edu : chr "N" "N" "N" "N" ...
$ study : chr "LING" "LING" "LING" "CIS" ...
$ english_grade: num 6 7 8 7 7 8 7 8 6 8 ...
$ english_score: num 5.19 6.82 8.21 7.34 6.59 ...
head
Access parts of table by specifying row and/or column numbers
dat[a,b]
:
a
indicates the selected rows of datb
indicates the selected columns of dat$
operator
dat$sex
accesses the column sex
of dat
[1] "F" "M" "M" "F" "F" "F" "F" "F" "F" "M" "M" "F" "M" "F" "F" "F" "F" "F"
[19] "F" "F" "F" "F" "F" "F" "M" "F" "M" "M" "F" "F" "F" "F" "F" "F" "F" "F"
[37] "M" "M" "F" "F" "F" "M" "M" "F" "F" "F" "M" "M" "M" "F" "F" "F" "M" "M"
[55] "F" "F" "F" "F" "F" "F" "F" "F" "M" "F" "M" "F" "F" "F" "M" "F" "F" "M"
[73] "M" "M" "M" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "M" "F" "F" "M"
[91] "M" "F" "M" "F" "M" "M" "F" "F" "F" "M" "F" "F" "F" "F" "M" "M" "F" "F"
[109] "F" "F" "M" "F" "F" "F" "M" "F" "F" "M" "M" "M" "M" "M" "M" "F" "F" "F"
[127] "F" "M" "F" "F" "F" "M" "F" "M" "F" "M" "M" "F" "M" "F" "F" "M" "M" "F"
[145] "F" "M" "F" "F" "F" "F" "F" "F" "F" "M" "F" "F" "F" "F" "M" "M" "F" "F"
[163] "F" "F" "F" "F" "F" "M" "M" "F" "F" "F" "M" "F" "F" "M" "F" "F" "F" "F"
[181] "F" "F" "F" "F" "F" "F" "F" "M" "F" "F" "M" "F" "F" "F" "F" "F" "F" "F"
[199] "F" "F"
participant year sex bl_edu study english_grade english_score
2 2 2020 M N LING 7 6.8208
3 3 2020 M N LING 8 8.2118
10 10 2020 M N IS 8 8.8922
11 11 2020 M N CIS 7 6.7571
13 13 2020 M N CIS 6 6.3324
25 25 2020 M N OTHER 9 8.3452
&
|
# only participants who study IS *and- are male
tmp <- dat[dat$sex == 'M' & dat$study == 'IS',]
head(tmp)
participant year sex bl_edu study english_grade english_score
10 10 2020 M N IS 8.0 8.8922
27 27 2020 M N IS 8.0 8.9217
28 28 2020 M N IS 7.0 8.0216
37 37 2020 M N IS 8.1 8.6534
43 43 2020 M N IS 6.0 6.6602
47 47 2020 M N IS 9.0 8.9312
!
(not)
!=
# only females (i.e. not males) *or* everybody with an English grade over 7
tmp <- dat[dat$sex != 'M' | dat$english_grade > 7,]
tail(tmp) # tail shows final 6 rows
participant year sex bl_edu study english_grade english_score
494 494 2024 F N LING 5.8 5.1720
495 495 2024 F N LING 7.0 8.0231
496 496 2024 M N IS 8.0 7.5441
497 497 2024 F N LING 6.0 7.1884
498 498 2024 F N LING 6.5 6.4241
499 499 2024 M N IS 9.0 9.5693
$
helps us to do that# new column 'diff': English grade - English proficiency score
dat$diff <- dat$english_grade - dat$english_score
head(dat)
participant year sex bl_edu study english_grade english_score diff
1 1 2020 F N LING 6 5.1902 0.80976
2 2 2020 M N LING 7 6.8208 0.17917
3 3 2020 M N LING 8 8.2118 -0.21182
4 4 2020 F N CIS 7 7.3397 -0.33970
5 5 2020 F N LING 7 6.5873 0.41273
6 6 2020 F N LING 8 7.5489 0.45106
dat$pass_fail <- 'PASS' # new column, initially PASS for everybody
dat[dat$english_grade < 5.5,]$pass_fail <- 'FAIL' # if grade too low, then FAIL
tail(dat[dat$english_grade > 4 & dat$english_grade < 6, 2:9]) # show subset of data
year sex bl_edu study english_grade english_score diff pass_fail
392 2023 F N CIS 5.6 5.9877 -0.387718 PASS
436 2024 F N IS 5.8 5.7252 0.074803 PASS
454 2024 F N LING 5.0 6.1166 -1.116598 FAIL
468 2024 F Y CIS 5.0 4.3000 0.700000 FAIL
490 2024 F N LING 5.8 6.0576 -0.257642 PASS
494 2024 F N LING 5.8 5.1720 0.627971 PASS
R
barplot()
(illustrated in the following)plot()
boxplot()
hist()
qqnorm()
and qqline()
R
R
is to conduct statistical analysesR
R
R
cor()
for the correlationlm()
for linear regressionglm()
for logistic regressionalpha()
(from package psych
) for Cronbach’s \(\alpha\)
Call:
lm(formula = english_grade ~ bl_edu, data = dat)
Residuals:
Min 1Q Median 3Q Max
-2.640 -0.246 -0.246 0.754 2.154
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.2457 0.0404 179.28 <2e-16 ***
bl_eduY 0.3947 0.1318 2.99 0.0029 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.86 on 498 degrees of freedom
Multiple R-squared: 0.0177, Adjusted R-squared: 0.0157
F-statistic: 8.97 on 1 and 498 DF, p-value: 0.00289
R
R
as calculatorR
R
Thank you for your attention!
https://www.martijnwieling.nl
m.b.wieling@rug.nl