
Introduction to R
RStudio and R
R as calculatorRRRR| participant | year | sex | bl_edu | study | english_grade | english_score |
|---|---|---|---|---|---|---|
| 1 | 2020 | F | N | LING | 6 | 5.19 |
| 2 | 2020 | M | N | LING | 7 | 6.82 |
| 3 | 2020 | M | N | LING | 8 | 8.21 |
| 4 | 2020 | F | N | CIS | 7 | 7.34 |
| 5 | 2020 | F | N | LING | 7 | 6.59 |
| 6 | 2020 | F | N | LING | 8 | 7.55 |
| 7 | 2020 | F | N | LING | 7 | 7.19 |
| 8 | 2020 | F | Y | LING | 8 | 7.63 |
| 9 | 2020 | F | N | LING | 6 | 6.58 |
| 10 | 2020 | M | N | IS | 8 | 8.89 |
| 11 | 2020 | M | N | CIS | 7 | 6.76 |
| participant | year | sex | bl_edu | study | english_grade | english_score |
|---|---|---|---|---|---|---|
| 123 | 2021 | M | N | LING | 5.0 | 6.10 |
| 124 | 2021 | F | N | CIS | 6.0 | 6.67 |
| 125 | 2021 | F | N | CIS | 7.0 | 7.42 |
| 126 | 2021 | F | N | LING | 8.0 | 9.10 |
| 127 | 2021 | F | N | CIS | 7.0 | 7.47 |
| 128 | 2021 | M | N | LING | 8.4 | 8.14 |
| 129 | 2021 | F | N | LING | 8.0 | 7.65 |
| 130 | 2021 | F | N | CIS | 6.0 | 7.35 |
| 131 | 2021 | F | N | LING | 8.0 | 8.54 |
| 132 | 2021 | M | N | IS | 8.0 | 8.39 |
| 133 | 2021 | F | N | LING | 7.0 | 7.98 |
| participant | year | sex | bl_edu | study | english_grade | english_score |
|---|---|---|---|---|---|---|
| 225 | 2022 | M | N | IS | 8 | 7.10 |
| 226 | 2022 | F | N | OTHER | 9 | 7.76 |
| 227 | 2022 | F | N | OTHER | 7 | 5.68 |
| 228 | 2022 | F | N | CIS | 7 | 7.31 |
| 229 | 2022 | F | N | LING | 7 | 7.95 |
| 230 | 2022 | M | N | OTHER | 7 | 7.51 |
| 231 | 2022 | F | N | IS | 7 | 6.97 |
| 232 | 2022 | F | N | CIS | 6 | 6.22 |
| 233 | 2022 | M | N | OTHER | 8 | 8.71 |
| 234 | 2022 | F | N | LING | 7 | 6.78 |
| 235 | 2022 | F | N | CIS | 6 | 5.94 |
| participant | year | sex | bl_edu | study | english_grade | english_score |
|---|---|---|---|---|---|---|
| 320 | 2023 | M | N | LING | 8.0 | 9.02 |
| 321 | 2023 | F | N | LING | 8.0 | 7.44 |
| 322 | 2023 | F | N | CIS | 9.0 | 9.74 |
| 323 | 2023 | F | N | CIS | 7.0 | 9.06 |
| 324 | 2023 | F | N | CIS | 8.0 | 8.35 |
| 325 | 2023 | F | N | LING | 7.3 | 8.55 |
| 326 | 2023 | F | N | CIS | 6.0 | 6.51 |
| 327 | 2023 | F | N | LING | 7.0 | 7.87 |
| 328 | 2023 | M | N | CIS | 6.0 | 7.22 |
| 329 | 2023 | F | N | LING | 7.0 | 7.08 |
| 330 | 2023 | F | N | OTHER | 8.0 | 8.69 |
| participant | year | sex | bl_edu | study | english_grade | english_score |
|---|---|---|---|---|---|---|
| 427 | 2024 | M | N | IS | 8.0 | 7.87 |
| 428 | 2024 | F | N | OTHER | 8.0 | 8.99 |
| 429 | 2024 | F | N | OTHER | 7.3 | 7.75 |
| 430 | 2024 | F | N | OTHER | 7.0 | 8.37 |
| 431 | 2024 | F | N | LING | 7.0 | 6.93 |
| 432 | 2024 | M | N | IS | 8.0 | 8.21 |
| 433 | 2024 | F | N | LING | 7.0 | 8.12 |
| 434 | 2024 | F | N | LING | 8.0 | 8.28 |
| 435 | 2024 | F | N | LING | 7.0 | 8.81 |
| 436 | 2024 | F | N | IS | 5.8 | 5.73 |
| 437 | 2024 | F | N | LING | 7.1 | 7.31 |
Measures of central tendency and spread
Visualization

R (this lecture)
R
R?R compared to (e.g.,) SPSS
R)

R as calculator
R: exporting a data set
R: importing a data set'data.frame': 500 obs. of 7 variables:
$ participant : int 1 2 3 4 5 6 7 8 9 10 ...
$ year : int 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
$ sex : chr "F" "M" "M" "F" ...
$ bl_edu : chr "N" "N" "N" "N" ...
$ study : chr "LING" "LING" "LING" "CIS" ...
$ english_grade: num 6 7 8 7 7 8 7 8 6 8 ...
$ english_score: num 5.19 6.82 8.21 7.34 6.59 ...
head
Access parts of table by specifying row and/or column numbers
dat[a,b]:
a indicates the selected rows of datb indicates the selected columns of dat$ operator
dat$sex accesses the column sex of dat [1] "F" "M" "M" "F" "F" "F" "F" "F" "F" "M" "M" "F" "M" "F" "F" "F" "F" "F"
[19] "F" "F" "F" "F" "F" "F" "M" "F" "M" "M" "F" "F" "F" "F" "F" "F" "F" "F"
[37] "M" "M" "F" "F" "F" "M" "M" "F" "F" "F" "M" "M" "M" "F" "F" "F" "M" "M"
[55] "F" "F" "F" "F" "F" "F" "F" "F" "M" "F" "M" "F" "F" "F" "M" "F" "F" "M"
[73] "M" "M" "M" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "M" "F" "F" "M"
[91] "M" "F" "M" "F" "M" "M" "F" "F" "F" "M" "F" "F" "F" "F" "M" "M" "F" "F"
[109] "F" "F" "M" "F" "F" "F" "M" "F" "F" "M" "M" "M" "M" "M" "M" "F" "F" "F"
[127] "F" "M" "F" "F" "F" "M" "F" "M" "F" "M" "M" "F" "M" "F" "F" "M" "M" "F"
[145] "F" "M" "F" "F" "F" "F" "F" "F" "F" "M" "F" "F" "F" "F" "M" "M" "F" "F"
[163] "F" "F" "F" "F" "F" "M" "M" "F" "F" "F" "M" "F" "F" "M" "F" "F" "F" "F"
[181] "F" "F" "F" "F" "F" "F" "F" "M" "F" "F" "M" "F" "F" "F" "F" "F" "F" "F"
[199] "F" "F"
participant year sex bl_edu study english_grade english_score
2 2 2020 M N LING 7 6.8208
3 3 2020 M N LING 8 8.2118
10 10 2020 M N IS 8 8.8922
11 11 2020 M N CIS 7 6.7571
13 13 2020 M N CIS 6 6.3324
25 25 2020 M N OTHER 9 8.3452
&|# only participants who study IS *and- are male
tmp <- dat[dat$sex == 'M' & dat$study == 'IS',]
head(tmp) participant year sex bl_edu study english_grade english_score
10 10 2020 M N IS 8.0 8.8922
27 27 2020 M N IS 8.0 8.9217
28 28 2020 M N IS 7.0 8.0216
37 37 2020 M N IS 8.1 8.6534
43 43 2020 M N IS 6.0 6.6602
47 47 2020 M N IS 9.0 8.9312
! (not)
!=# only females (i.e. not males) *or* everybody with an English grade over 7
tmp <- dat[dat$sex != 'M' | dat$english_grade > 7,]
tail(tmp) # tail shows final 6 rows participant year sex bl_edu study english_grade english_score
494 494 2024 F N LING 5.8 5.1720
495 495 2024 F N LING 7.0 8.0231
496 496 2024 M N IS 8.0 7.5441
497 497 2024 F N LING 6.0 7.1884
498 498 2024 F N LING 6.5 6.4241
499 499 2024 M N IS 9.0 9.5693
$ helps us to do that# new column 'diff': English grade - English proficiency score
dat$diff <- dat$english_grade - dat$english_score
head(dat) participant year sex bl_edu study english_grade english_score diff
1 1 2020 F N LING 6 5.1902 0.80976
2 2 2020 M N LING 7 6.8208 0.17917
3 3 2020 M N LING 8 8.2118 -0.21182
4 4 2020 F N CIS 7 7.3397 -0.33970
5 5 2020 F N LING 7 6.5873 0.41273
6 6 2020 F N LING 8 7.5489 0.45106
dat$pass_fail <- 'PASS' # new column, initially PASS for everybody
dat[dat$english_grade < 5.5,]$pass_fail <- 'FAIL' # if grade too low, then FAIL
tail(dat[dat$english_grade > 4 & dat$english_grade < 6, 2:9]) # show subset of data year sex bl_edu study english_grade english_score diff pass_fail
392 2023 F N CIS 5.6 5.9877 -0.387718 PASS
436 2024 F N IS 5.8 5.7252 0.074803 PASS
454 2024 F N LING 5.0 6.1166 -1.116598 FAIL
468 2024 F Y CIS 5.0 4.3000 0.700000 FAIL
490 2024 F N LING 5.8 6.0576 -0.257642 PASS
494 2024 F N LING 5.8 5.1720 0.627971 PASS
Rbarplot() (illustrated in the following)plot()boxplot()hist()qqnorm() and qqline()RR is to conduct statistical analysesRR
Rcor() for the correlationlm() for linear regressionglm() for logistic regressionalpha() (from package psych) for Cronbach’s \(\alpha\)
Call:
lm(formula = english_grade ~ bl_edu, data = dat)
Residuals:
Min 1Q Median 3Q Max
-2.640 -0.246 -0.246 0.754 2.154
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.2457 0.0404 179.28 <2e-16 ***
bl_eduY 0.3947 0.1318 2.99 0.0029 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.86 on 498 degrees of freedom
Multiple R-squared: 0.0177, Adjusted R-squared: 0.0157
F-statistic: 8.97 on 1 and 498 DF, p-value: 0.00289
R
R as calculatorRRThank you for your attention!
https://www.martijnwieling.nl
m.b.wieling@rug.nl