Martijn Wieling

University of Groningen

- Introduction
- ERPs to study grammatical gender violations
- Research question

- Design
- Methods (
`R`

code) and results - Discussion

- A P600 (a positivity 'around' 600 ms. after stimulus onset) is sensitive to grammatical violations
- An N400 (a negativity 'around' 400 ms. after stimulus onset) is modulated by semantic context and lexical properties of a word
- The P600/N400 are found by comparing incorrect to correct sentences

- Native speakers appear to show a P600 for grammatical gender violations
- But analyzed by averaging over items and over subjects!

- In this study we are interested in how non-native speakers respond to grammatical gender violations (joint work with Nienke Meulman)
- Grammatical gender is very hard to learn for L2 learners
- Even though behaviorally L2 learners might show correct responses, the brain may reveal differences in processing grammatical gender

- Is the P600 for grammatical gender violations dependent on age of arrival for the L2 learners of German?

- Today: analysis of single region of interest (ROI 8)

- 67 L2 speakers of German (Slavic L1)
- Auditory presentation of correct sentences or sentences with a grammatical gender violation (incorrect determiner; no determiners in L1)
- 48 items in each condition: 96 trials per participant (minus artifacts)
- Example:

Nach der Schlägerei ist das/*der Auge des Angestellten von der Krankenschwester versorgt worden.

[After the fight theneut/*themasc eye of the worker was treated by the nurse]

```
load("dat.rda")
dat <- start_event(dat, event = c("Subject", "TrialNr")) # sorted signal per subject and trial
head(dat)
```

```
# uV Time Subject Word TrialNr Type AoArr start.event
# 1 37.6 505 GL103 Brot 1 cor 8 TRUE
# 2 38.1 515 GL103 Brot 1 cor 8 FALSE
# 3 39.9 525 GL103 Brot 1 cor 8 FALSE
# 4 28.4 535 GL103 Brot 1 cor 8 FALSE
# 5 34.8 545 GL103 Brot 1 cor 8 FALSE
# 6 42.4 555 GL103 Brot 1 cor 8 FALSE
```

```
dim(dat) # signal was downsampled to 100 Hz
```

```
# [1] 442160 8
```

`mgcv`

version 1.8.24, `itsadug`

version 2.3)```
library(mgcv)
library(itsadug)
# duration discrete=F: 3600 s.; 1/2/4/8/16 threads: 1000/560/300/225/250 s.
system.time(m0 <- bam(uV ~ s(Time, by = Type) + Type + s(Time, Subject, by = Type,
bs = "fs", m = 1) + s(Time, Word, by = Type, bs = "fs", m = 1), data = dat,
rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 8))
```

```
# user system elapsed
# 1498.2 31.4 221.7
```

- Time window was set to [500,1300] to limit CPU time
- ACF of model without
`rho`

was used to determine`rhoval`

: 0.91

```
summary(m0) # slides only show the relevant part of the summary
```

```
# Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -0.561 0.521 -1.08 0.282
# Typeincor 0.790 0.669 1.18 0.238
#
# Approximate significance of smooth terms:
# edf Ref.df F p-value
# s(Time):Typecor 1.05 1.10 0.39 0.582
# s(Time):Typeincor 3.32 4.32 6.76 1.25e-05 ***
# s(Time,Subject):Typecor 58.98 602.00 0.90 <2e-16 ***
# s(Time,Subject):Typeincor 53.95 602.00 0.48 <2e-16 ***
# s(Time,Word):Typecor 68.29 863.00 0.29 <2e-16 ***
# s(Time,Word):Typeincor 65.83 863.00 0.26 <2e-16 ***
#
# Deviance explained = 5.2%
```

```
par(mfrow = c(1, 2))
plot_smooth(m0, view = "Time", rug = F, plot_all = "Type", main = "", rm.ranef = T)
plot_diff(m0, view = "Time", comp = list(Type = c("incor", "cor")), rm.ranef = T)
```

```
dat$IsIncorrect <- (dat$Type == "incor") * 1 # create binary predictor: 0 = cor, 1 = incor
m0b <- bam(uV ~ s(Time) + s(Time, by = IsIncorrect) + s(Time, Subject, bs = "fs",
m = 1) + s(Time, Subject, by = IsIncorrect, bs = "fs", m = 1) + s(Time,
Word, bs = "fs", m = 1) + s(Time, Word, by = IsIncorrect, bs = "fs", m = 1),
data = dat, rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 8)
```

`s(Time, by=IsIncorrect)`

is equal to 0 whenever`IsIncorrect`

equals 0- Correct case:
`s(Time) + 0 = s(Time)`

- Incorrect case:
`s(Time) + s(Time, by=IsIncorrect)`

**Difference**between correct and incorrect:`s(Time, by=IsIncorrect)`

- Binary curve difference is
**non-centered**(i.e. includes intercept difference)

```
summary(m0b, re.test = FALSE) # summary without random effects (quicker to compute)
```

```
# Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -0.568 0.468 -1.21 0.225
#
# Approximate significance of smooth terms:
# edf Ref.df F p-value
# s(Time) 1.64 2.05 0.67 0.536
# s(Time):IsIncorrect 4.08 5.00 3.90 0.002 **
```

`s(Time):IsIncorrect`

shows the significance of the combined intercept and non-linear difference between correct and incorrect

```
dat$TypeO <- as.ordered(dat$Type) # creating an ordered factor ...
contrasts(dat$TypeO) <- "contr.treatment" # ... with contrast treatment: cor = 0, incor = 1
m0o <- bam(uV ~ s(Time) + s(Time, by = TypeO) + TypeO + s(Time, Subject, bs = "fs",
m = 1) + s(Time, Subject, by = TypeO, bs = "fs", m = 1) + s(Time, Word,
bs = "fs", m = 1) + s(Time, Word, by = TypeO, bs = "fs", m = 1), data = dat,
rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 8)
```

`s(Time, by=TypeO)`

is equal to 0 whenever`TypeO`

equals`cor`

(reference level)**Difference**between correct and incorrect:`s(Time, by=TypeO) + TypeO`

`s(Time, by=TypeO)`

:**centered**non-linear difference`TypeO`

(must be included): intercept difference

```
summary(m0o, re.test = FALSE)
```

```
# Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -0.568 0.468 -1.21 0.225
# TypeOincor 0.784 0.575 1.37 0.172
#
# Approximate significance of smooth terms:
# edf Ref.df F p-value
# s(Time) 1.64 2.05 0.67 0.536
# s(Time):TypeOincor 3.08 4.00 4.58 0.001 **
```

- The \(p\)-value of the parametric coefficient
`TypeOincor`

represents the significance of the**intercept****difference**between correct and incorrect - The \(p\)-value of the smooth term
`s(Time):TypeOincor`

represents the significance of the**non-linear****difference**between correct and incorrect

```
par(mfrow = c(1, 2))
plot(m0b, select = 2, shade = T, rug = F, main = "Binary difference curve", ylim = c(-3, 3))
plot(m0o, select = 2, shade = T, rug = F, main = "Ordered factor difference curve", ylim = c(-3, 3))
```

`te`

is used to model a non-linear interaction with predictors on a different scale)```
m1 <- bam(uV ~ te(Time, AoArr, by = Type) + Type + s(Time, Subject, by = Type, bs = "fs",
m = 1) + s(Time, Word, by = Type, bs = "fs", m = 1), data = dat, rho = rhoval,
AR.start = dat$start.event, discrete = T, nthreads = 8)
summary(m1, re.test = FALSE)
```

```
# Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -0.323 0.537 -0.60 0.547
# Typeincor 0.266 0.685 0.39 0.698
#
# Approximate significance of smooth terms:
# edf Ref.df F p-value
# te(Time,AoArr):Typecor 3.10 3.19 1.55 0.192
# te(Time,AoArr):Typeincor 5.88 6.96 5.14 7.91e-06 ***
#
# Deviance explained = 5.2%
```

```
plot_diff2(m1, view = c("Time", "AoArr"), rm.ranef = T, comp = list(Type = c("incor", "cor")))
fadeRug(dat$Time, dat$AoArr) # hide points without data
```

`f112300`

and `ShinyDem0`

)```
m2 <- bam(uV ~ s(Time, by = Type) + s(AoArr, by = Type) + ti(Time, AoArr, by = Type) + Type + s(Time,
Subject, by = Type, bs = "fs", m = 1) + s(Time, Word, by = Type, bs = "fs", m = 1), data = dat,
rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 8) # te(x,y) = s(x) + s(y) + ti(x,y)
summary(m2, re.test = FALSE)
```

```
# Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -0.353 0.537 -0.66 0.510
# Typeincor 0.357 0.685 0.52 0.602
#
# Approximate significance of smooth terms:
# edf Ref.df F p-value
# s(Time):Typecor 1.01 1.02 0.00 0.953
# s(Time):Typeincor 3.29 4.28 6.30 3.21e-05 ***
# s(AoArr):Typecor 1.01 1.01 2.11 0.147
# s(AoArr):Typeincor 1.44 1.50 3.03 0.039 *
# ti(Time,AoArr):Typecor 1.02 1.04 2.30 0.124
# ti(Time,AoArr):Typeincor 2.13 3.01 0.46 0.709
```

```
m3 <- bam(uV ~ s(Time, by = Type) + s(AoArr, by = Type) + Type + s(Time, Subject, by = Type,
bs = "fs", m = 1) + s(Time, Word, by = Type, bs = "fs", m = 1), data = dat, rho = rhoval,
AR.start = dat$start.event, discrete = T, nthreads = 8) # ti-terms dropped
summary(m3, re.test = FALSE)
```

```
# Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -0.350 0.537 -0.65 0.515
# Typeincor 0.356 0.685 0.52 0.603
#
# Approximate significance of smooth terms:
# edf Ref.df F p-value
# s(Time):Typecor 1.01 1.01 0.37 0.548
# s(Time):Typeincor 3.32 4.32 6.76 1.25e-05 ***
# s(AoArr):Typecor 1.03 1.04 2.13 0.148
# s(AoArr):Typeincor 1.42 1.48 3.00 0.040 *
```

`fREML`

- If we set
`select = T`

, all**smooths**are considered random effects, and model comparison can be done using models fit with`fREML`

(default fitting method)- Advantage:
`discrete = T`

usable, and`fREML`

fitting is much faster than`ML`

- Disadvantage: it is an approximation, the results will be less precise

- Advantage:

```
m2.alt <- bam(uV ~ s(Time, by = Type) + s(AoArr, by = Type) + ti(Time, AoArr, by = Type) +
Type + s(Time, Subject, by = Type, bs = "fs", m = 1) + s(Time, Word, by = Type,
bs = "fs", m = 1), data = dat, rho = rhoval, AR.start = dat$start.event, select = T,
discrete = T, nthreads = 8)
m3.alt <- bam(uV ~ s(Time, by = Type) + s(AoArr, by = Type) + Type + s(Time, Subject,
by = Type, bs = "fs", m = 1) + s(Time, Word, by = Type, bs = "fs", m = 1), data = dat,
rho = rhoval, AR.start = dat$start.event, select = T, discrete = T, nthreads = 8)
```

```
compareML(m2.alt, m3.alt)
```

```
# m2.alt: uV ~ s(Time, by = Type) + s(AoArr, by = Type) + ti(Time, AoArr,
# by = Type) + Type + s(Time, Subject, by = Type, bs = "fs",
# m = 1) + s(Time, Word, by = Type, bs = "fs", m = 1)
#
# m3.alt: uV ~ s(Time, by = Type) + s(AoArr, by = Type) + Type + s(Time,
# Subject, by = Type, bs = "fs", m = 1) + s(Time, Word, by = Type,
# bs = "fs", m = 1)
#
# Chi-square test of fREML scores
# -----
# Model Score Edf Difference Df p.value Sig.
# 1 m3.alt 1492230 18
# 2 m2.alt 1492229 24 1.266 6.000 0.865
#
# AIC difference: 108.19, model m3.alt has lower AIC.
```

- No support to include
`ti`

-terms (simpler model`m3.alt`

is better)

```
m4 <- bam(uV ~ s(Time) + s(Time, by = TypeO) + s(AoArr) + s(AoArr, by = TypeO) + TypeO +
s(Time, Subject, bs = "fs", m = 1) + s(Time, Subject, by = TypeO, bs = "fs", m = 1) +
s(Time, Word, bs = "fs", m = 1) + s(Time, Word, by = TypeO, bs = "fs", m = 1), data = dat,
rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 8)
summary(m4, re.test = FALSE)
```

```
# Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -0.367 0.482 -0.76 0.447
# TypeOincor 0.321 0.572 0.56 0.575
#
# Approximate significance of smooth terms:
# edf Ref.df F p-value
# s(Time) 1.64 2.05 0.67 0.536
# s(Time):TypeOincor 3.08 4.00 4.58 0.001 **
# s(AoArr) 1.03 1.04 2.33 0.130
# s(AoArr):TypeOincor 1.00 1.00 9.10 0.003 **
```

```
par(mfrow = c(1, 2))
plot(m4, select = 2, shade = T, rug = F, ylim = c(-3, 3))
plot(m4, select = 4, shade = T, rug = F, ylim = c(-6, 6))
```

```
library(car)
par(mfrow = c(1, 2))
qqp(resid(m4)) # quantile-quantile plot function from library car
hist(resid(m4))
```

- This type of residual distribution is common for EEG data
- These
**extreme**deviations are problematic and may affect \(p\)-values - Distribution of residuals looks like scaled-\(t\) distribution
- We can fit this type of model in
`bam`

:`family="scat"`

- We can fit this type of model in

```
system.time(m4.scat <- bam(uV ~ s(Time) + s(Time, by = TypeO) + s(AoArr) + s(AoArr,
by = TypeO) + TypeO + s(Time, Subject, bs = "fs", m = 1) + s(Time, Subject, by = TypeO,
bs = "fs", m = 1) + s(Time, Word, bs = "fs", m = 1) + s(Time, Word, by = TypeO,
bs = "fs", m = 1), data = dat, family = "scat", rho = rhoval, AR.start = dat$start.event,
discrete = T, nthreads = 32))
```

```
# user system elapsed
# 211395 501 7374
```

```
# For comparison, duration of the Gaussian model (8 CPU's is fastest)
system.time(m4 <- bam(uV ~ s(Time) + s(Time, by = TypeO) + s(AoArr) + s(AoArr, by = TypeO) +
TypeO + s(Time, Subject, bs = "fs", m = 1) + s(Time, Subject, by = TypeO, bs = "fs",
m = 1) + s(Time, Word, bs = "fs", m = 1) + s(Time, Word, by = TypeO, bs = "fs",
m = 1), data = dat, rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 8))
```

```
# user system elapsed
# 1504.6 37.1 231.2
```

```
summary(m4, re.test = FALSE)$s.table # significance of smooths
```

```
# edf Ref.df F p-value
# s(Time) 1.64 2.05 0.666 0.53620
# s(Time):TypeOincor 3.08 4.00 4.579 0.00108
# s(AoArr) 1.03 1.04 2.334 0.13016
# s(AoArr):TypeOincor 1.00 1.00 9.100 0.00253
```

```
summary(m4.scat, re.test = FALSE)$s.table # significance of smooths
```

```
# edf Ref.df F p-value
# s(Time) 2.35 3.03 1.258 0.28831
# s(Time):TypeOincor 3.12 4.04 4.363 0.00153
# s(AoArr) 1.10 1.11 0.502 0.54263
# s(AoArr):TypeOincor 1.01 1.02 8.421 0.00365
```

```
par(mfrow = c(1, 2))
qqp(resid(m4), main = "m4")
qqp(resid(m4.scat), main = "m4.scat")
```

- GAMs are very useful to analyze EEG and other time-series data
- GAMs can detect non-linear patterns, while taking into account individual variation and autocorrelation
- Associated paper: Meulman et al. (2015) (paper package: data and code)

- Still work to do:
- Assessing by-word variability in the (linear) effect of age of arrival
- Testing the significance of other possibly important variables (e.g., proficiency)
**But stay close to your hypothesis**: much unexplained variation in EEG data!

- We have applied GAMs to EEG data and learned how to:
- Model difference smooths directly using binary predictors and ordered factors
- Use
`te(Time,AoArr)`

to model a non-linear interaction - Decompose
`te(Time,AoArr)`

using`ti()`

and two`s()`

's - Use a scaled-\(t\) distribution to improve residuals

- While we have analyzed a single region of interest, GAMs allow for spatial distribution analyses
- E.g., via
`te(x, y, Time, d = c(2,1))`

- See http://eolomea.let.rug.nl/GAM/ScalpEEG

- E.g., via
- Associated lab session:

Thank you for your attention!