Martijn Wieling

University of Groningen

- Introduction
- Recap: multiple regression
- Mixed-effects regression analysis: explanation
- Case-study: Influence of alcohol on L1 & L2
- Conclusion

- Consider the following situation (taken from Clark, 1973):
- Mr. A and Mrs. B study reading latencies of verbs and nouns
- Each randomly selects 20 words and tests 50 subjects
- Mr. A finds (using a sign test) verbs to have faster responses
- Mrs. B finds nouns to have faster responses

- How is this possible?

- The problem is that Mr. A and Mrs. B disregard the (huge) variability in the words
- Mr. A included a difficult noun, but Mrs. B included a difficult verb
- Their set of words does not constitute the complete population of nouns and verbs, therefore their results are limited to their words

- This is known as the language-as-fixed-effect fallacy (LAFEF)
- Fixed-effect factors have repeatable and a small number of levels
- Word is a random-effect factor (a non-repeatable random sample from a larger population)

- LAFEF occurs frequently in linguistic research until the 1970's
- Many reported significant results are wrong (the method is anti-conservative)!

- Clark (1973) combined a by-subject (\(F_1\)) analysis and by-item (\(F_2\)) analysis in measure
*min F'*- Results are significant and generalizable across subjects and items when
*min F'*is significant - Unfortunately many researchers (>50%!) incorrectly interpreted this study and may report wrong results (Raaijmakers et al., 1999)
- E.g., they only use \(F_1\) and \(F_2\) and not
*min F'*or they use \(F_2\) while unneccesary (e.g., counterbalanced design)

- Results are significant and generalizable across subjects and items when

- Apparently, analyzing this type of data is difficult...
- Fortunately, using mixed-effects regression models solves these problems!
- The method is easier than using the approach of Clark (1973)
- Results can be generalized across subjects and items
- Mixed-effects models are robust to missing data (Baayen, 2008, p. 266)
- We can easily test if it is necessary to treat item as a random effect
- No balanced design necessary (as in repeated-measures ANOVA)!

- But first some words about regression...

- Multiple regression: predict one numerical variable on the basis of other independent variables (numerical or categorical)
- We can write a regression formula as \(y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \epsilon\)
- \(y\): (value of the) dependent variable
- \(x_i\): (value of the) predictor
- \(\beta_0\): intercept, value of \(y\) when all \(x_i = 0\)
- \(\beta_i\): slope, change in \(y\) when the value of \(x_i\) increases with 1
- \(\epsilon\): residuals, difference between observed values and predicted (fitted) values

- Factorial predictors are (automatically) represented by binary-valued predictors: \(x_i = 0\) (reference level) or \(x_i = 1\) (alternative level)
- Factor with \(n\) levels: \(n-1\) binary predictors
- Interpretation of factorial \(\beta_i\): change from reference to alternative level

- Example of regression formula:
- Predict the reaction time of a subject on the basis of word frequency, word length, and subject age:
**RT = 200 - 5WF + 3WL + 10SA**

- Predict the reaction time of a subject on the basis of word frequency, word length, and subject age:

- Mixed-effects regression distinguishes fixed effects and random-effect factors
- Fixed effects:
- All numerical predictors
- Factorial predictors with a repeatable and small number of levels (e.g., Gender)

- Random-effect
**factors**:- Only factorial predictors!
- Levels are a non-repeatable random sample from a larger population
- Generally a large number of levels (e.g., Subject, Item)

- Random-effect factors are factors which are likely to introduce systematic variation (here:
**subject**and**item**)- Some
**subjects**have a slow response (RT), while others are fast

= Random Intercept for Subject (i.e. \(\beta_0\) varies per subject) - Some
**items**are easy to recognize, others hard

= Random Intercept for Item (i.e. \(\beta_0\) varies per item) - The effect of item frequency on RT might be higher for one
**subject**than another: e.g., non-native participants might benefit more from frequent words than native participants

= Random Slope for Item Frequency per Subject (i.e. \(\beta_{\textrm{WF}}\) varies per subject) - The effect of subject age on RT might be different for one
**item**than another: e.g., modern words might be recognized faster by younger participants

= Random Slope for Subject Age per Item (i.e. \(\beta_{\textrm{SA}}\) varies per item)

- Some
- Note that it is essential to test for random slopes!

```
Estimate Std. Error t value Pr(>|t|)
Linear regression DistOrigin -6.418e-05 1.808e-06 -35.49 <2e-16
+ Random intercepts DistOrigin -2.224e-05 6.863e-06 -3.240 <0.001
+ Random slopes DistOrigin -1.478e-05 1.519e-05 -0.973 n.s.
```

(This example is explained at the HLP/Jaeger lab blog)

- Mixed-effects regression allow us to use random intercepts and slopes (i.e. adjustments to the population intercept and slopes) to include the variance structure in our data
- Parsimony: a single parameter (standard deviation) models this variation for every random slope or intercept (a normal distribution with mean 0 is assumed)
- The slope and intercept adjustments are Best Linear Unbiased Predictors
- Model comparison determines the inclusion of random intercepts and slopes

- Mixed-effects regression is only required when each level of the random-effect factor has multiple observations (e.g., participants respond to multiple items)

- RT = 200 - 5WF + 3WL + 10SA (general model)
- The intercepts and slopes may vary (according to the estimated variation for each parameter) and this influences the word- and subject-specific values

- RT = 400 - 5WF + 3WL - 2SA (
**word**: scythe) - RT = 300 - 5WF + 3WL + 15SA (
**word**: twitter) - RT = 300 - 7WF + 3WL + 10SA (
**subject**: non-native) - RT = 150 - 5WF + 3WL + 10SA (
**subject**: fast) - And it is not hard to use!
`lmer( RT ~ WF + WL + SA + (1+SA|Wrd) + (1+WF|Subj) )`

- (
`lmer`

automatically discovers random-effects structure: nested/crossed)

- For example:

Subject | Intercept | WF slope |
---|---|---|

S1 | 525 | -2 |

S2 | 400 | -1 |

S3 | 500 | -2 |

S4 | 550 | -3 |

S5 | 450 | -2 |

S6 | 600 | -4 |

S7 | 300 | 0 |

`lmer`

do not suffer from shrinkage- The BLUPS (i.e. adjustment to the model estimates per item/participant) are close to the real adjustments, as
`lmer`

takes into account regression towards the mean (fast subjects will be slower next time, and slow subjects will be faster) thereby avoiding overfitting and improving prediction - see Efron & Morris (1977)

- Otherwise random slopes and intercepts may show a spurious correlation
- Also helps the interpretation of interactions (see this lecture)

- Independent observations within each level of the random-effect factor
- Relation between dependent and independent variables linear
- No strong multicollinearity
- Residuals are not autocorrelated
- Homoscedasticity of variance in residuals
- Residuals are normally distributed
- (Similar assumptions as for regression)

- Check the distribution of residuals: if not normally distributed and/or heteroscedastic residuals then transform dependent variable or use generalized linear mixed-effects regression modeling
- Check outlier characteristics and refit the model when large outliers are excluded to verify that your effects are not 'carried' by these outliers

- "The data analyst knows more than the computer." (Henderson & Velleman, 1981)
- Get to know your data!

- There is no adequate
*automatic*procedure to find the best model - Fitting the most complex random effects structure (Barr et al., 2013) is mostly not a good idea
- The model may not converge, or is uninterpretable (see Baayen et al., 2017)
- The power of your method is reduced (Matuschek et al., 2017)

- My stepwise variable-selection procedure (for exploratory analysis):
- Include random intercepts
- Add other potential explanatory variables one-by-one
- Insignificant predictors are dropped
- Test predictors for inclusion which were excluded at an earlier stage
- Test possible interactions (don't make it too complex)
- Try to break the model by including significant predictors as random slopes
- Only choose a more complex model if supported by model comparison

- For a hypothesis-driven analysis, stepwise selection is problematic
- Confidence intervals too narrow: \(p\)-values too low (multiple comparisons)
- See, e.g., Burnham & Anderson (2002)

- Solutions:
- Careful specification of potential
*a priori*models lining up with the hypotheses (including optimal random-effects structure) and evaluating only these models- This may be followed by an exploratory procedure

- Validating a stepwise procedure via cross validation (e.g., bootstrap analysis)

- Careful specification of potential

- Reported in Wieling et al. (2019) and Offrede et al. (2020)
- Assess influence of alcohol (BAC) on L1 (clarity) vs. L2 (nativelikeness) ratings
- Prediction: higher BAC has negative effect on L1, but positive effect on L2 nativelikeness (based on Renner et al., 2018)
- ~80 recordings from native Dutch speakers included (all BACs < 0.8, no drugs)
- Dutch ratings were given by >100 native (sober) Dutch speakers (at Lowlands)
- English ratings were given by >100 native American English speakers (online)
- Dependent variable is \(z\)-transformed rating (5-point Likert scale)
- Numerical variables are centered (not \(z\)-transformed)

```
load("lls.rda")
head(lls)
```

```
# SID BAC Gender BirthYear L2cnt SelfEN LivedEN L2anxiety Edu LID Lang Rating
# 1 S0045188-17 0.392 F -5.66 -1.04 0.463 Y 0.357 0.817 L0637009 NL 0.946
# 2 S0045188-17 0.392 F -5.66 -1.04 0.463 Y 0.357 0.817 L196 EN 0.330
# 3 S0045188-17 0.392 F -5.66 -1.04 0.463 Y 0.357 0.817 L86 EN -0.298
# 4 S0045188-17 0.392 F -5.66 -1.04 0.463 Y 0.357 0.817 L0614758 NL 0.325
# 5 S0045188-17 0.392 F -5.66 -1.04 0.463 Y 0.357 0.817 L220 EN -0.435
# 6 S0045188-17 0.392 F -5.66 -1.04 0.463 Y 0.357 0.817 L225 EN -0.239
```

`lme4`

version 1.1.27)```
library(lme4)
m <- lmer(Rating ~ BAC + (1 | SID) + (1 | LID), data = lls) # does not converge
```

```
# boundary (singular) fit: see ?isSingular
```

```
summary(m)$coef # show fixed effects
```

```
# Estimate Std. Error t value
# (Intercept) 0.144 0.0638 2.252
# BAC -0.221 0.3262 -0.677
```

```
summary(m)$varcor # show random-effects part only: no variability per listener (due to z-scaling)
```

```
# Groups Name Std.Dev.
# LID (Intercept) 0.000
# SID (Intercept) 0.560
# Residual 0.738
```

```
m1 <- lmer(Rating ~ Lang * BAC + (1 | SID), data = lls)
```

- This model represents our hypothesis test (but likely without the correct random-effects structure)

```
summary(m1, cor = F) # suppress expected correlation of regression coefficients; Note: |t| > 2 => p < .05 (N > 100)
```

```
# Linear mixed model fit by REML ['lmerMod']
# Formula: Rating ~ Lang * BAC + (1 | SID)
# Data: lls
#
# REML criterion at convergence: 5853
#
# Scaled residuals:
# Min 1Q Median 3Q Max
# -3.668 -0.647 0.032 0.707 3.187
#
# Random effects:
# Groups Name Variance Std.Dev.
# SID (Intercept) 0.305 0.552
# Residual 0.533 0.730
# Number of obs: 2541, groups: SID, 82
#
# Fixed effects:
# Estimate Std. Error t value
# (Intercept) 0.0883 0.0635 1.39
# LangNL 0.2430 0.0358 6.78
# BAC -0.0592 0.3255 -0.18
# LangNL:BAC -0.7286 0.1938 -3.76
```

```
library(visreg)
visreg(m1, "BAC", by = "Lang", overlay = T, xlab = "BAC (centered)", ylab = "Rating")
```