
Introduction to R
RStudio and R
R as calculatorRRRR| participant | year | sex | bl_edu | study | english_grade | english_score |
|---|---|---|---|---|---|---|
| 1 | 2020 | M | N | LING | 5.0 | 6.10 |
| 2 | 2020 | F | N | CIS | 6.0 | 6.67 |
| 3 | 2020 | F | N | CIS | 7.0 | 7.42 |
| 4 | 2020 | F | N | LING | 8.0 | 9.10 |
| 5 | 2020 | F | N | CIS | 7.0 | 7.47 |
| 6 | 2020 | M | N | LING | 8.4 | 8.14 |
| 7 | 2020 | F | N | LING | 8.0 | 7.65 |
| 8 | 2020 | F | N | CIS | 6.0 | 7.35 |
| 9 | 2020 | F | N | LING | 8.0 | 8.54 |
| 10 | 2020 | M | N | IS | 8.0 | 8.39 |
| 11 | 2020 | F | N | LING | 7.0 | 7.98 |
| participant | year | sex | bl_edu | study | english_grade | english_score |
|---|---|---|---|---|---|---|
| 103 | 2021 | F | N | LING | 6 | 5.19 |
| 104 | 2021 | M | N | LING | 7 | 6.82 |
| 105 | 2021 | M | N | LING | 8 | 8.21 |
| 106 | 2021 | F | N | CIS | 7 | 7.34 |
| 107 | 2021 | F | N | LING | 7 | 6.59 |
| 108 | 2021 | F | N | LING | 8 | 7.55 |
| 109 | 2021 | F | N | LING | 7 | 7.19 |
| 110 | 2021 | F | Y | LING | 8 | 7.63 |
| 111 | 2021 | F | N | LING | 6 | 6.58 |
| 112 | 2021 | M | N | IS | 8 | 8.89 |
| 113 | 2021 | M | N | CIS | 7 | 6.76 |
| participant | year | sex | bl_edu | study | english_grade | english_score |
|---|---|---|---|---|---|---|
| 225 | 2022 | M | N | LING | 8.0 | 9.02 |
| 226 | 2022 | F | N | LING | 8.0 | 7.44 |
| 227 | 2022 | F | N | CIS | 9.0 | 9.74 |
| 228 | 2022 | F | N | CIS | 7.0 | 9.06 |
| 229 | 2022 | F | N | CIS | 8.0 | 8.35 |
| 230 | 2022 | F | N | LING | 7.3 | 8.55 |
| 231 | 2022 | F | N | CIS | 6.0 | 6.51 |
| 232 | 2022 | F | N | LING | 7.0 | 7.87 |
| 233 | 2022 | M | N | CIS | 6.0 | 7.22 |
| 234 | 2022 | F | N | LING | 7.0 | 7.08 |
| 235 | 2022 | F | N | OTHER | 8.0 | 8.69 |
| participant | year | sex | bl_edu | study | english_grade | english_score |
|---|---|---|---|---|---|---|
| 316 | 2023 | M | N | OTHER | 6.0 | 6.86 |
| 317 | 2023 | M | N | LING | 8.5 | 8.07 |
| 318 | 2023 | M | N | IS | 8.0 | 7.72 |
| 319 | 2023 | F | N | LING | 7.4 | 7.53 |
| 320 | 2023 | F | Y | CIS | 8.0 | 9.23 |
| 321 | 2023 | M | N | IS | 7.0 | 7.64 |
| 322 | 2023 | M | N | IS | 7.0 | 7.82 |
| 323 | 2023 | F | N | LING | 8.0 | 8.65 |
| 324 | 2023 | M | N | LING | 9.3 | 9.09 |
| 325 | 2023 | M | N | IS | 6.0 | 7.61 |
| 326 | 2023 | F | N | LING | 8.0 | 8.26 |
| participant | year | sex | bl_edu | study | english_grade | english_score |
|---|---|---|---|---|---|---|
| 422 | 2024 | F | N | CIS | 7 | 7.26 |
| 423 | 2024 | F | N | LING | 7 | 7.62 |
| 424 | 2024 | F | N | OTHER | 7 | 8.40 |
| 425 | 2024 | M | N | LING | 7 | 8.00 |
| 426 | 2024 | F | N | CIS | 8 | 8.77 |
| 427 | 2024 | F | N | CIS | 7 | 7.78 |
| 428 | 2024 | M | Y | IS | 9 | 9.50 |
| 429 | 2024 | F | N | CIS | 7 | 7.55 |
| 430 | 2024 | F | N | CIS | 6 | 6.81 |
| 431 | 2024 | F | N | LING | 7 | 7.43 |
| 432 | 2024 | F | N | LING | 7 | 7.05 |
Measures of central tendency and spread
Visualization

R (this lecture)
R
R?R compared to (e.g.,) SPSS
R)

R as calculator
R: exporting a dataset
R: importing a dataset'data.frame': 500 obs. of 7 variables:
$ participant : int 1 2 3 4 5 6 7 8 9 10 ...
$ year : int 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
$ sex : chr "M" "F" "F" "F" ...
$ bl_edu : chr "N" "N" "N" "N" ...
$ study : chr "LING" "CIS" "CIS" "LING" ...
$ english_grade: num 5 6 7 8 7 8.4 8 6 8 8 ...
$ english_score: num 6.1 6.67 7.42 9.1 7.47 ...
head
Access parts of table by specifying row and/or column numbers
dat[a,b]:
a indicates the selected rows of datb indicates the selected columns of dat$ operator
dat$sex accesses the column sex of dat [1] "M" "F" "F" "F" "F" "M" "F" "F" "F" "M" "F" "M" "F" "M" "M" "F" "M" "F"
[19] "F" "M" "M" "F" "F" "M" "F" "F" "F" "F" "F" "F" "F" "M" "F" "F" "F" "F"
[37] "M" "M" "F" "F" "F" "F" "F" "F" "F" "M" "M" "F" "F" "F" "M" "F" "F" "M"
[55] "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "M" "F" "F" "M" "F" "F" "F"
[73] "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "M" "F" "F" "F" "F" "F"
[91] "F" "F" "M" "F" "F" "F" "M" "M" "F" "F" "F" "F" "F" "M" "M" "F" "F" "F"
[109] "F" "F" "F" "M" "M" "F" "M" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F" "F"
[127] "M" "F" "M" "M" "F" "F" "F" "F" "F" "F" "F" "F" "M" "M" "F" "F" "F" "M"
[145] "M" "F" "F" "F" "M" "M" "M" "F" "F" "F" "M" "M" "F" "F" "F" "F" "F" "F"
[163] "F" "F" "M" "F" "M" "F" "F" "F" "M" "F" "F" "M" "M" "M" "M" "F" "F" "F"
[181] "F" "F" "F" "F" "F" "F" "F" "F" "M" "F" "F" "M" "M" "F" "M" "F" "M" "M"
[199] "F" "F"
participant year sex bl_edu study english_grade english_score
1 1 2020 M N LING 5.0 6.1000
6 6 2020 M N LING 8.4 8.1369
10 10 2020 M N IS 8.0 8.3883
12 12 2020 M N OTHER 7.0 6.1467
14 14 2020 M N IS 9.0 9.1297
15 15 2020 M N CIS 7.0 8.2633
&|# only participants who study IS *and- are male
tmp <- dat[dat$sex == 'M' & dat$study == 'IS',]
head(tmp) participant year sex bl_edu study english_grade english_score
10 10 2020 M N IS 8 8.3883
14 14 2020 M N IS 9 9.1297
17 17 2020 M N IS 8 7.1360
20 20 2020 M N IS 8 8.4594
21 21 2020 M N IS 7 8.2508
32 32 2020 M N IS 7 7.5851
! (not)
!=# only females (i.e. not males) *or* everybody with an English grade over 7
tmp <- dat[dat$sex != 'M' | dat$english_grade > 7,]
tail(tmp) # tail shows final 6 rows participant year sex bl_edu study english_grade english_score
493 493 2024 M N IS 8.0 8.1840
494 494 2024 M N IS 8.0 8.0375
495 495 2024 F N OTHER 8.0 8.4751
497 497 2024 F N OTHER 7.0 7.1963
498 498 2024 F N LING 6.0 7.2741
500 500 2024 M N IS 7.5 6.2609
$ helps us to do that# new column 'diff': English grade - English proficiency score
dat$diff <- dat$english_grade - dat$english_score
head(dat) participant year sex bl_edu study english_grade english_score diff
1 1 2020 M N LING 5.0 6.1000 -1.09996
2 2 2020 F N CIS 6.0 6.6736 -0.67357
3 3 2020 F N CIS 7.0 7.4229 -0.42291
4 4 2020 F N LING 8.0 9.0964 -1.09636
5 5 2020 F N CIS 7.0 7.4698 -0.46977
6 6 2020 M N LING 8.4 8.1369 0.26309
dat$pass_fail <- 'PASS' # new column, initially PASS for everybody
dat[dat$english_grade < 5.5,]$pass_fail <- 'FAIL' # if grade too low, then FAIL
tail(dat[dat$english_grade > 4 & dat$english_grade < 6, 2:9]) # show subset of data year sex bl_edu study english_grade english_score diff pass_fail
341 2023 F N IS 5.8 5.7252 0.074803 PASS
359 2023 F N LING 5.0 6.1166 -1.116598 FAIL
373 2023 F Y CIS 5.0 4.3000 0.700000 FAIL
395 2023 F N LING 5.8 6.0576 -0.257642 PASS
399 2023 F N LING 5.8 5.1720 0.627971 PASS
469 2024 F N LING 5.0 5.9713 -0.971288 FAIL
Rbarplot() (illustrated in the following)plot()boxplot()hist()qqnorm() and qqline()RR is to conduct statistical analysesRR
Rlm() (linear regression)
Call:
lm(formula = english_grade ~ bl_edu, data = dat)
Residuals:
Min 1Q Median 3Q Max
-2.640 -0.246 -0.246 0.754 2.154
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.2457 0.0404 179.28 <2e-16 ***
bl_eduY 0.3947 0.1318 2.99 0.0029 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.86 on 498 degrees of freedom
Multiple R-squared: 0.0177, Adjusted R-squared: 0.0157
F-statistic: 8.97 on 1 and 498 DF, p-value: 0.00289
R
R as calculatorRRThank you for your attention!
https://www.martijnwieling.nl
m.b.wieling@rug.nl