Generalized additive models for EEG data

Martijn Wieling
University of Groningen

This lecture

  • Introduction
    • ERPs to study grammatical gender violations
    • Research question
  • Design
  • Methods (R code) and results
  • Discussion

ERPs to study grammatical gender violations

  • A P600 (a positivity 'around' 600 ms. after stimulus onset) is sensitive to grammatical violations
  • An N400 (a negativity 'around' 400 ms. after stimulus onset) is modulated by semantic context and lexical properties of a word
    • The P600/N400 are found by comparing incorrect to correct sentences
  • Native speakers appear to show a P600 for grammatical gender violations
    • But analyzed by averaging over items and over subjects!

This study

  • In this study we are interested in how non-native speakers respond to grammatical gender violations (joint work with Nienke Meulman)
  • Grammatical gender is very hard to learn for L2 learners
  • Even though behaviorally L2 learners might show correct responses, the brain may reveal differences in processing grammatical gender

Research question

  • Is the P600 for grammatical gender violations dependent on age of arrival for the L2 learners of German?

ERP data

  • Today: analysis of single region of interest (ROI 8)

Design

  • 67 L2 speakers of German (Slavic L1)
  • Auditory presentation of correct sentences or sentences with a grammatical gender violation (incorrect determiner; no determiners in L1)
  • 48 items in each condition: 96 trials per participant (minus artifacts)
  • Example:
     
    Nach der Schlägerei ist das/*der Auge des Angestellten von der Krankenschwester versorgt worden.
     
    [After the fight theneut/*themasc eye of the worker was treated by the nurse]

Data overview

load("dat.rda")
dat <- start_event(dat, event = c("Subject", "TrialNr"))  # sorted signal per subject and trial
head(dat)
#     uV Time Subject Word TrialNr Type AoArr start.event
# 1 37.6  505   GL103 Brot       1  cor     8        TRUE
# 2 38.1  515   GL103 Brot       1  cor     8       FALSE
# 3 39.9  525   GL103 Brot       1  cor     8       FALSE
# 4 28.4  535   GL103 Brot       1  cor     8       FALSE
# 5 34.8  545   GL103 Brot       1  cor     8       FALSE
# 6 42.4  555   GL103 Brot       1  cor     8       FALSE
dim(dat)  # signal was downsampled to 100 Hz
# [1] 442160      8

Much individual variation

plot of chunk unnamed-chunk-3

General patterns exist

(note the arbitrary age splits, however)

plot of chunk unnamed-chunk-4

Question 1

Investigating difference between correct and incorrect

(R version 3.5.1 (2018-07-02), mgcv version 1.8.24, itsadug version 2.3)

library(mgcv)
library(itsadug)

# duration discrete=F: 3600 s.; 1/2/4/8/16 threads: 1000/560/300/225/250 s.
system.time(m0 <- bam(uV ~ s(Time, by = Type) + Type + s(Time, Subject, by = Type, 
    bs = "fs", m = 1) + s(Time, Word, by = Type, bs = "fs", m = 1), data = dat, 
    rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 8))
#    user  system elapsed 
#  1498.2    31.4   221.7
  • Time window was set to [500,1300] to limit CPU time
  • ACF of model without rho was used to determine rhoval: 0.91

Global difference between correct and incorrect

summary(m0)  # slides only show the relevant part of the summary
# Parametric coefficients:
#             Estimate Std. Error t value Pr(>|t|) 
# (Intercept)   -0.561      0.521   -1.08    0.282 
# Typeincor      0.790      0.669    1.18    0.238 
# 
# Approximate significance of smooth terms:
#                             edf Ref.df    F  p-value    
# s(Time):Typecor            1.05   1.10 0.39    0.582    
# s(Time):Typeincor          3.32   4.32 6.76 1.25e-05 ***
# s(Time,Subject):Typecor   58.98 602.00 0.90   <2e-16 ***
# s(Time,Subject):Typeincor 53.95 602.00 0.48   <2e-16 ***
# s(Time,Word):Typecor      68.29 863.00 0.29   <2e-16 ***
# s(Time,Word):Typeincor    65.83 863.00 0.26   <2e-16 ***
# 
# Deviance explained = 5.2%

Visualizing difference between correct and incorrect

par(mfrow = c(1, 2))
plot_smooth(m0, view = "Time", rug = F, plot_all = "Type", main = "", rm.ranef = T)
plot_diff(m0, view = "Time", comp = list(Type = c("incor", "cor")), rm.ranef = T)

plot of chunk unnamed-chunk-9

Modeling the difference directly using a binary curve

dat$IsIncorrect <- (dat$Type == "incor") * 1  # create binary predictor: 0 = cor, 1 = incor
m0b <- bam(uV ~ s(Time) + s(Time, by = IsIncorrect) + s(Time, Subject, bs = "fs", 
    m = 1) + s(Time, Subject, by = IsIncorrect, bs = "fs", m = 1) + s(Time, 
    Word, bs = "fs", m = 1) + s(Time, Word, by = IsIncorrect, bs = "fs", m = 1), 
    data = dat, rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 8)
  • s(Time, by=IsIncorrect) is equal to 0 whenever IsIncorrect equals 0
  • Correct case: s(Time) + 0 = s(Time)
  • Incorrect case: s(Time) + s(Time, by=IsIncorrect)
    • Difference between correct and incorrect: s(Time, by=IsIncorrect)
    • Binary curve difference is non-centered (i.e. includes intercept difference)

Results using a binary curve

summary(m0b, re.test = FALSE)  # summary without random effects (quicker to compute)
# Parametric coefficients:
#             Estimate Std. Error t value Pr(>|t|) 
# (Intercept)   -0.568      0.468   -1.21    0.225 
# 
# Approximate significance of smooth terms:
#                      edf Ref.df    F p-value    
# s(Time)             1.64   2.05 0.67   0.536    
# s(Time):IsIncorrect 4.08   5.00 3.90   0.002 **
  • s(Time):IsIncorrect shows the significance of the combined intercept and non-linear difference between correct and incorrect

Modeling the difference using an ordered factor

dat$TypeO <- as.ordered(dat$Type)  # creating an ordered factor ...
contrasts(dat$TypeO) <- "contr.treatment"  # ... with contrast treatment: cor = 0, incor = 1
m0o <- bam(uV ~ s(Time) + s(Time, by = TypeO) + TypeO + s(Time, Subject, bs = "fs", 
    m = 1) + s(Time, Subject, by = TypeO, bs = "fs", m = 1) + s(Time, Word, 
    bs = "fs", m = 1) + s(Time, Word, by = TypeO, bs = "fs", m = 1), data = dat, 
    rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 8)
  • s(Time, by=TypeO) is equal to 0 whenever TypeO equals cor (reference level)
  • Difference between correct and incorrect: s(Time, by=TypeO) + TypeO
    • s(Time, by=TypeO): centered non-linear difference
    • TypeO (must be included): intercept difference

Results using an ordered factor

summary(m0o, re.test = FALSE)
# Parametric coefficients:
#             Estimate Std. Error t value Pr(>|t|) 
# (Intercept)   -0.568      0.468   -1.21    0.225 
# TypeOincor     0.784      0.575    1.37    0.172 
# 
# Approximate significance of smooth terms:
#                     edf Ref.df    F p-value    
# s(Time)            1.64   2.05 0.67   0.536    
# s(Time):TypeOincor 3.08   4.00 4.58   0.001 **
  • The \(p\)-value of the parametric coefficient TypeOincor represents the significance of the intercept difference between correct and incorrect
  • The \(p\)-value of the smooth term s(Time):TypeOincor represents the significance of the non-linear difference between correct and incorrect

Visualization of both difference curves

par(mfrow = c(1, 2))
plot(m0b, select = 2, shade = T, rug = F, main = "Binary difference curve", ylim = c(-3, 3))
plot(m0o, select = 2, shade = T, rug = F, main = "Ordered factor difference curve", ylim = c(-3, 3))

plot of chunk unnamed-chunk-17

Question 2

Testing our research question: a non-linear interaction

(te is used to model a non-linear interaction with predictors on a different scale)

m1 <- bam(uV ~ te(Time, AoArr, by = Type) + Type + s(Time, Subject, by = Type, bs = "fs", 
    m = 1) + s(Time, Word, by = Type, bs = "fs", m = 1), data = dat, rho = rhoval, 
    AR.start = dat$start.event, discrete = T, nthreads = 8)
summary(m1, re.test = FALSE)
# Parametric coefficients:
#             Estimate Std. Error t value Pr(>|t|) 
# (Intercept)   -0.323      0.537   -0.60    0.547 
# Typeincor      0.266      0.685    0.39    0.698 
# 
# Approximate significance of smooth terms:
#                           edf Ref.df    F  p-value    
# te(Time,AoArr):Typecor   3.10   3.19 1.55    0.192    
# te(Time,AoArr):Typeincor 5.88   6.96 5.14 7.91e-06 ***
# 
# Deviance explained = 5.2%

Visualization of the two-dimensional difference

Note the default maximum number of edf's per 2D tensor product: 24 (5\(^2\) - 1)

plot_diff2(m1, view = c("Time", "AoArr"), rm.ranef = T, comp = list(Type = c("incor", "cor")))
fadeRug(dat$Time, dat$AoArr)  # hide points without data

plot of chunk unnamed-chunk-22

Interpreting the two-dimensional difference

plot of chunk unnamed-chunk-24

Interpreting two-dimensional interactions

Decomposition: the pure effect of age of arrival

m2 <- bam(uV ~ s(Time, by = Type) + s(AoArr, by = Type) + ti(Time, AoArr, by = Type) + Type + s(Time, 
    Subject, by = Type, bs = "fs", m = 1) + s(Time, Word, by = Type, bs = "fs", m = 1), data = dat, 
    rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 8)  # te(x,y) = s(x) + s(y) + ti(x,y)
summary(m2, re.test = FALSE)
# Parametric coefficients:
#             Estimate Std. Error t value Pr(>|t|) 
# (Intercept)   -0.353      0.537   -0.66    0.510 
# Typeincor      0.357      0.685    0.52    0.602 
# 
# Approximate significance of smooth terms:
#                           edf Ref.df    F  p-value    
# s(Time):Typecor          1.01   1.02 0.00    0.953    
# s(Time):Typeincor        3.29   4.28 6.30 3.21e-05 ***
# s(AoArr):Typecor         1.01   1.01 2.11    0.147    
# s(AoArr):Typeincor       1.44   1.50 3.03    0.039 *  
# ti(Time,AoArr):Typecor   1.02   1.04 2.30    0.124    
# ti(Time,AoArr):Typeincor 2.13   3.01 0.46    0.709

A simpler model without the non-linear interaction

m3 <- bam(uV ~ s(Time, by = Type) + s(AoArr, by = Type) + Type + s(Time, Subject, by = Type, 
    bs = "fs", m = 1) + s(Time, Word, by = Type, bs = "fs", m = 1), data = dat, rho = rhoval, 
    AR.start = dat$start.event, discrete = T, nthreads = 8)  # ti-terms dropped
summary(m3, re.test = FALSE)
# Parametric coefficients:
#             Estimate Std. Error t value Pr(>|t|) 
# (Intercept)   -0.350      0.537   -0.65    0.515 
# Typeincor      0.356      0.685    0.52    0.603 
# 
# Approximate significance of smooth terms:
#                     edf Ref.df    F  p-value    
# s(Time):Typecor    1.01   1.01 0.37    0.548    
# s(Time):Typeincor  3.32   4.32 6.76 1.25e-05 ***
# s(AoArr):Typecor   1.03   1.04 2.13    0.148    
# s(AoArr):Typeincor 1.42   1.48 3.00    0.040 *

Model comparison: workaround to use fREML

  • If we set select = T, all smooths are considered random effects, and model comparison can be done using models fit with fREML (default fitting method)
    • Advantage: discrete = T usable, and fREML fitting is much faster than ML
    • Disadvantage: it is an approximation, the results will be less precise
m2.alt <- bam(uV ~ s(Time, by = Type) + s(AoArr, by = Type) + ti(Time, AoArr, by = Type) + 
    Type + s(Time, Subject, by = Type, bs = "fs", m = 1) + s(Time, Word, by = Type, 
    bs = "fs", m = 1), data = dat, rho = rhoval, AR.start = dat$start.event, select = T, 
    discrete = T, nthreads = 8)

m3.alt <- bam(uV ~ s(Time, by = Type) + s(AoArr, by = Type) + Type + s(Time, Subject, 
    by = Type, bs = "fs", m = 1) + s(Time, Word, by = Type, bs = "fs", m = 1), data = dat, 
    rho = rhoval, AR.start = dat$start.event, select = T, discrete = T, nthreads = 8)

Model comparison: results

compareML(m2.alt, m3.alt)
# m2.alt: uV ~ s(Time, by = Type) + s(AoArr, by = Type) + ti(Time, AoArr, 
#     by = Type) + Type + s(Time, Subject, by = Type, bs = "fs", 
#     m = 1) + s(Time, Word, by = Type, bs = "fs", m = 1)
# 
# m3.alt: uV ~ s(Time, by = Type) + s(AoArr, by = Type) + Type + s(Time, 
#     Subject, by = Type, bs = "fs", m = 1) + s(Time, Word, by = Type, 
#     bs = "fs", m = 1)
# 
# Chi-square test of fREML scores
# -----
#    Model   Score Edf Difference    Df p.value Sig.
# 1 m3.alt 1492230  18                              
# 2 m2.alt 1492229  24      1.266 6.000   0.865     
# 
# AIC difference: 108.19, model m3.alt has lower AIC.
  • No support to include ti-terms (simpler model m3.alt is better)

Question 3

Ordered factor model: significant differences

m4 <- bam(uV ~ s(Time) + s(Time, by = TypeO) + s(AoArr) + s(AoArr, by = TypeO) + TypeO + 
    s(Time, Subject, bs = "fs", m = 1) + s(Time, Subject, by = TypeO, bs = "fs", m = 1) + 
    s(Time, Word, bs = "fs", m = 1) + s(Time, Word, by = TypeO, bs = "fs", m = 1), data = dat, 
    rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 8)
summary(m4, re.test = FALSE)
# Parametric coefficients:
#             Estimate Std. Error t value Pr(>|t|) 
# (Intercept)   -0.367      0.482   -0.76    0.447 
# TypeOincor     0.321      0.572    0.56    0.575 
# 
# Approximate significance of smooth terms:
#                      edf Ref.df    F p-value    
# s(Time)             1.64   2.05 0.67   0.536    
# s(Time):TypeOincor  3.08   4.00 4.58   0.001 ** 
# s(AoArr)            1.03   1.04 2.33   0.130    
# s(AoArr):TypeOincor 1.00   1.00 9.10   0.003 **

Difference curves

par(mfrow = c(1, 2))
plot(m4, select = 2, shade = T, rug = F, ylim = c(-3, 3))
plot(m4, select = 4, shade = T, rug = F, ylim = c(-6, 6))

plot of chunk unnamed-chunk-38

Finally: model criticism

library(car)
par(mfrow = c(1, 2))
qqp(resid(m4))  # quantile-quantile plot function from library car
hist(resid(m4))

plot of chunk unnamed-chunk-40

Problematic residuals!

  • This type of residual distribution is common for EEG data
  • These extreme deviations are problematic and may affect \(p\)-values
  • Distribution of residuals looks like scaled-\(t\) distribution
    • We can fit this type of model in bam: family="scat"

plot of chunk unnamed-chunk-41

Fitting a scaled-\(t\) model: slow!

system.time(m4.scat <- bam(uV ~ s(Time) + s(Time, by = TypeO) + s(AoArr) + s(AoArr, 
    by = TypeO) + TypeO + s(Time, Subject, bs = "fs", m = 1) + s(Time, Subject, by = TypeO, 
    bs = "fs", m = 1) + s(Time, Word, bs = "fs", m = 1) + s(Time, Word, by = TypeO, 
    bs = "fs", m = 1), data = dat, family = "scat", rho = rhoval, AR.start = dat$start.event, 
    discrete = T, nthreads = 32))
#    user  system elapsed 
#  211395     501    7374
# For comparison, duration of the Gaussian model (8 CPU's is fastest)
system.time(m4 <- bam(uV ~ s(Time) + s(Time, by = TypeO) + s(AoArr) + s(AoArr, by = TypeO) + 
    TypeO + s(Time, Subject, bs = "fs", m = 1) + s(Time, Subject, by = TypeO, bs = "fs", 
    m = 1) + s(Time, Word, bs = "fs", m = 1) + s(Time, Word, by = TypeO, bs = "fs", 
    m = 1), data = dat, rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 8))
#    user  system elapsed 
#  1504.6    37.1   231.2

Using the scaled-\(t\) distribution: \(p\)-values change

summary(m4, re.test = FALSE)$s.table  # significance of smooths
#                      edf Ref.df     F p-value
# s(Time)             1.64   2.05 0.666 0.53620
# s(Time):TypeOincor  3.08   4.00 4.579 0.00108
# s(AoArr)            1.03   1.04 2.334 0.13016
# s(AoArr):TypeOincor 1.00   1.00 9.100 0.00253
summary(m4.scat, re.test = FALSE)$s.table  # significance of smooths
#                      edf Ref.df     F p-value
# s(Time)             2.35   3.03 1.258 0.28831
# s(Time):TypeOincor  3.12   4.04 4.363 0.00153
# s(AoArr)            1.10   1.11 0.502 0.54263
# s(AoArr):TypeOincor 1.01   1.02 8.421 0.00365

Using the scaled-\(t\) distribution: similar patterns

plot of chunk unnamed-chunk-43

Model criticism: much improved!

par(mfrow = c(1, 2))
qqp(resid(m4), main = "m4")
qqp(resid(m4.scat), main = "m4.scat")

plot of chunk unnamed-chunk-44

Discussion and conclusion

  • GAMs are very useful to analyze EEG and other time-series data
    • GAMs can detect non-linear patterns, while taking into account individual variation and autocorrelation
    • Associated paper: Meulman et al. (2015) (paper package: data and code)
  • Still work to do:
    • Assessing by-word variability in the (linear) effect of age of arrival
    • Testing the significance of other possibly important variables (e.g., proficiency)
    • But stay close to your hypothesis: much unexplained variation in EEG data!

Recap

  • We have applied GAMs to EEG data and learned how to:
    • Model difference smooths directly using binary predictors and ordered factors
    • Use te(Time,AoArr) to model a non-linear interaction
    • Decompose te(Time,AoArr) using ti() and two s()'s
    • Use a scaled-\(t\) distribution to improve residuals
  • While we have analyzed a single region of interest, GAMs allow for spatial distribution analyses
  • Associated lab session:

Evaluation

Questions?

Thank you for your attention!

http://www.martijnwieling.nl
wieling@gmail.com