Analyzing EEG data using GAMs

Lecture 4 of advanced regression for linguists

Martijn Wieling
Department of Information Science

This lecture

  • Introduction
    • ERPs to study gender violations
    • Research question
  • Design
  • Methods: R code
  • Results
  • Discussion

ERPs to study grammatical gender violations

  • A P600 (a positivity 'around' 600 ms. after stimulus onset) is sensitive to grammatical violations
  • An N400 (a negativity 'around' 400 ms. after stimulus onset) is modulated by semantic context and lexical properties of a word
    • The P600/N400 are found by comparing the incorrect sentences to the correct sentences
  • Native speakers appear to show a P600 for gender violations
    • But analyzed by averaging over items and over subjects!
  • Here we are interested in how non-native speakers respond to gender violations (joint work with Nienke Meulman)
  • Gender is very hard to learn for L2 learners
  • Even though behaviorally L2 learners might show correct responses, the brain may reveal differences in processing gender

Research question

  • Is the P600 for gender violations dependent on age of arrival for the L2 learners of German?

ERP data

  • Today: analysis of single region of interest (ROI 8)

Design

  • 67 L2 speakers of German (Slavic L1)
  • Auditory presentation of correct sentences or sentences with a gender violation (incorrect determiner; no determiners in L1)
  • 48 items in each condition: 96 trials per participant (minus artifacts)
  • Example:
    Nach der Schlägerei ist das/*der Auge des Angestellten von der Krankenschwester versorgt worden.
    After the fight theneut/*themasc eye of the worker was treated by the nurse

Data overview

(R version 3.2.3 (2015-12-10), mgcv version 1.8.11, itsadug version 1.0.1)

library(mgcv)
library(itsadug)
load("dat.rda")
dat = start_event(dat, event = c("Subject", "TrialNr"))  # sorted data per subject and trial

head(dat)
#     uV Time Subject    Group Word TrialNr      Roi Structure Correctness L1 AoArr LoR Age Edu
# 1 37.6  505   GL103 GenEarly Brot       1 post.mid       DAN         cor RU     8  15  23   3
# 2 38.1  515   GL103 GenEarly Brot       1 post.mid       DAN         cor RU     8  15  23   3
# 3 39.9  525   GL103 GenEarly Brot       1 post.mid       DAN         cor RU     8  15  23   3
# 4 28.4  535   GL103 GenEarly Brot       1 post.mid       DAN         cor RU     8  15  23   3
# 5 34.8  545   GL103 GenEarly Brot       1 post.mid       DAN         cor RU     8  15  23   3
# 6 42.4  555   GL103 GenEarly Brot       1 post.mid       DAN         cor RU     8  15  23   3
#              AoAGroup start.event
# 1 Group 1: AoArr 7-12        TRUE
# 2 Group 1: AoArr 7-12       FALSE
# 3 Group 1: AoArr 7-12       FALSE
# 4 Group 1: AoArr 7-12       FALSE
# 5 Group 1: AoArr 7-12       FALSE
# 6 Group 1: AoArr 7-12       FALSE
dim(dat)
# [1] 442160     16

Much individual variation

(the signal has been downsampled to 100 Hz)

plot of chunk unnamed-chunk-2

General patterns exist

(note the arbitrary age splits, however)

plot of chunk unnamed-chunk-3

Investigating difference between correct and incorrect

(for all participants; to limit CPU time, the time window was set to [500,1300])

dat$SubjectCor = interaction(dat$Subject, dat$Correctness)
dat$WordCor = interaction(dat$Word, dat$Correctness)
# duration discrete=F: 9300 seconds, so using discrete with nthreads is much faster
system.time(m0 <- bam(uV ~ s(Time, by = Correctness) + Correctness + s(Time, SubjectCor, bs = "fs", m = 1) + 
    s(Time, WordCor, bs = "fs", m = 1), data = dat, rho = rhoval, AR.start = dat$start.event, discrete = T, 
    nthreads = 2))
#    user  system elapsed 
#  853.89    4.82  449.54
system.time(smry0 <- summary(m0))  # storing summary as it also takes some time to compute
#    user  system elapsed 
#    77.1     0.6    77.8

Global difference between correct and incorrect: results

smry0  # show summary
# 
# Family: gaussian 
# Link function: identity 
# 
# Formula:
# uV ~ s(Time, by = Correctness) + Correctness + s(Time, SubjectCor, 
#     bs = "fs", m = 1) + s(Time, WordCor, bs = "fs", m = 1)
# 
# Parametric coefficients:
#                  Estimate Std. Error t value Pr(>|t|)
# (Intercept)        -0.554      0.473   -1.17     0.24
# Correctnessincor    0.775      0.670    1.16     0.25
# 
# Approximate significance of smooth terms:
#                             edf  Ref.df    F p-value    
# s(Time):Correctnesscor     1.04    1.08 0.39    0.57    
# s(Time):Correctnessincor   3.32    4.32 6.77 1.2e-05 ***
# s(Time,SubjectCor)       114.28 1205.00 0.69 < 2e-16 ***
# s(Time,WordCor)          134.33 1727.00 0.28 < 2e-16 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# R-sq.(adj) =  0.0513   Deviance explained = 5.18%
# fREML = 1.4922e+06  Scale est. = 281.02    n = 442160

Visualizing difference between correct and incorrect

par(mfrow = c(1, 2))
plot_smooth(m0, view = "Time", rug = F, plot_all = "Correctness", main = "", rm.ranef = T, eegAxis = T)
plot_diff(m0, view = "Time", comp = list(Correctness = c("incor", "cor")), rm.ranef = T, eegAxis = T)

plot of chunk unnamed-chunk-6

Modeling the difference directly using a binary curve

dat$IsIncorrect = (dat$Correctness == "incor") * 1  # creating a binary variable (incor: 1, cor: 0)
m0b = bam(uV ~ s(Time) + s(Time, by = IsIncorrect) + s(Time, SubjectCor, bs = "fs", m = 1) + s(Time, 
    WordCor, bs = "fs", m = 1), data = dat, rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 2)
# s(Time,by=IsIncorrect) equals 0 when IsIncorrect = 0, otherwise it is a NON-CENTERED non-linear
# difference trajectory (i.e. including intercept difference)
smry0b = summary(m0b)
smry0b$p.table
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   -0.541      0.473   -1.14    0.253
smry0b$s.table
#                        edf  Ref.df     F   p-value
# s(Time)               1.64    2.05 0.667  5.35e-01
# s(Time):IsIncorrect   4.08    5.00 3.833  1.80e-03
# s(Time,SubjectCor)  114.28 1204.00 0.688 1.46e-125
# s(Time,WordCor)     134.33 1726.00 0.275  7.19e-55
# the p-value of the binary curve indicates the significance of the full DIFFERENCE between cor and
# incor

Modeling the difference using an ordered factor

dat$CorrectnessO = as.ordered(dat$Correctness)  # creating an ordered factor ...
contrasts(dat$CorrectnessO) = "contr.treatment"  # ... with contrast treatment
m0o = bam(uV ~ s(Time) + s(Time, by = CorrectnessO) + CorrectnessO + s(Time, SubjectCor, bs = "fs", m = 1) + 
    s(Time, WordCor, bs = "fs", m = 1), data = dat, rho = rhoval, AR.start = dat$start.event, discrete = T, 
    nthreads = 2)
# s(Time,by=CorrecnessO) equals 0 when CorrectnessO is at its reference level (cor), otherwise it is
# a CENTERED non-linear difference trajectory
smry0o = summary(m0o)
smry0o$p.table
#                   Estimate Std. Error t value Pr(>|t|)
# (Intercept)         -0.541      0.473   -1.14    0.253
# CorrectnessOincor    0.763      0.670    1.14    0.255
# the p-value for the intercept difference indicates the significance of the constant DIFFERENCE
# between cor and incor

smry0o$s.table
#                              edf  Ref.df     F   p-value
# s(Time)                     1.64    2.05 0.669  5.35e-01
# s(Time):CorrectnessOincor   3.08    4.00 4.574  1.09e-03
# s(Time,SubjectCor)        114.28 1205.00 0.688 1.47e-125
# s(Time,WordCor)           134.33 1727.00 0.275  7.19e-55
# the p-value for the binary curve indicates the significance of the non-linear DIFFERENCE between
# cor and incor

Visualization of both difference curves

par(mfrow = c(1, 2))
plot(m0b, select = 2, shade = T, rug = F, main = "Binary difference curve", ylim = c(3, -3))
plot(m0o, select = 2, shade = T, rug = F, main = "Ordered factor difference curve", ylim = c(3, -3))

plot of chunk unnamed-chunk-9

Testing our research question: a non-linear interaction

(te is used to model a non-linear interaction)

m1 = bam(uV ~ te(Time, AoArr, by = Correctness) + Correctness + s(Time, SubjectCor, bs = "fs", m = 1) + 
    s(Time, WordCor, bs = "fs", m = 1), data = dat, rho = rhoval, AR.start = dat$start.event, discrete = T, 
    nthreads = 2)
smry1 = summary(m1)
smry1$p.table
#                  Estimate Std. Error t value Pr(>|t|)
# (Intercept)        -0.408      0.474  -0.861    0.389
# Correctnessincor    0.444      0.672   0.661    0.509
smry1$s.table
#                                    edf  Ref.df     F   p-value
# te(Time,AoArr):Correctnesscor     3.10    3.20 1.764  1.52e-01
# te(Time,AoArr):Correctnessincor   5.88    6.95 4.917  1.57e-05
# s(Time,SubjectCor)              111.94 1203.00 0.658 2.24e-119
# s(Time,WordCor)                 134.38 1727.00 0.275  5.93e-55

Visualization of the 2-dimensional difference

Note the default maximum number of edf's per smooth: 24 (5\(^2\) - 1)

plot_diff2(m1, view = c("Time", "AoArr"), rm.ranef = T, comp = list(Correctness = c("incor", "cor")))
fadeRug(dat$Time, dat$AoArr)  # indicate points without data

plot of chunk unnamed-chunk-11

Interpreting the 2-dimensional interaction

(note that the \(y\)-axes are not inverted here)

plot of chunk unnamed-chunk-12

Interpret interactions yourself!

Decomposition: the pure effect of age of arrival

m2 = bam(uV ~ s(Time, by = Correctness) + s(AoArr, by = Correctness) + ti(Time, AoArr, by = Correctness) + 
    Correctness + s(Time, SubjectCor, bs = "fs", m = 1) + s(Time, WordCor, bs = "fs", m = 1), data = dat, 
    rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 2)
smry2 = summary(m2)
smry2$p.table
#                  Estimate Std. Error t value Pr(>|t|)
# (Intercept)        -0.418      0.475  -0.881    0.378
# Correctnessincor    0.480      0.673   0.714    0.475
smry2$s.table
#                                    edf  Ref.df      F   p-value
# s(Time):Correctnesscor            1.04    1.08 0.0682  8.37e-01
# s(Time):Correctnessincor          3.27    4.25 6.2711  3.54e-05
# s(AoArr):Correctnesscor           1.06    1.07 2.6019  1.12e-01
# s(AoArr):Correctnessincor         1.06    1.06 3.8290  4.32e-02
# ti(Time,AoArr):Correctnesscor     1.04    1.07 2.1989  1.27e-01
# ti(Time,AoArr):Correctnessincor   1.53    1.89 0.3922  6.17e-01
# s(Time,SubjectCor)              111.82 1204.00 0.6515 1.32e-118
# s(Time,WordCor)                 134.38 1727.00 0.2755  5.95e-55

A simpler model without the non-linear interaction

m3 = bam(uV ~ s(Time, by = Correctness) + s(AoArr, by = Correctness) + Correctness + s(Time, SubjectCor, 
    bs = "fs", m = 1) + s(Time, WordCor, bs = "fs", m = 1), data = dat, rho = rhoval, AR.start = dat$start.event, 
    discrete = T, nthreads = 2)
smry3 = summary(m3)
smry3$p.table
#                  Estimate Std. Error t value Pr(>|t|)
# (Intercept)        -0.414      0.475  -0.872    0.383
# Correctnessincor    0.486      0.673   0.722    0.470
smry3$s.table
#                              edf  Ref.df     F   p-value
# s(Time):Correctnesscor      1.04    1.08 0.385  5.74e-01
# s(Time):Correctnessincor    3.32    4.32 6.765  1.25e-05
# s(AoArr):Correctnesscor     1.07    1.08 2.690  1.06e-01
# s(AoArr):Correctnessincor   1.10    1.11 3.386  5.21e-02
# s(Time,SubjectCor)        111.78 1204.00 0.650 2.32e-118
# s(Time,WordCor)           134.38 1727.00 0.275  5.97e-55

Model comparison

(workaround to use fREML comparison with discrete=T; select=T treats smooths as random effects)

m2.alt = bam(uV ~ s(Time, by = Correctness) + s(AoArr, by = Correctness) + ti(Time, AoArr, by = Correctness) + 
    Correctness + s(Time, SubjectCor, bs = "fs", m = 1) + s(Time, WordCor, bs = "fs", m = 1), data = dat, 
    rho = rhoval, AR.start = dat$start.event, method = "fREML", select = T, discrete = T, nthreads = 2)

m3.alt = bam(uV ~ s(Time, by = Correctness) + s(AoArr, by = Correctness) + Correctness + s(Time, SubjectCor, 
    bs = "fs", m = 1) + s(Time, WordCor, bs = "fs", m = 1), data = dat, rho = rhoval, AR.start = dat$start.event, 
    method = "fREML", select = T, discrete = T, nthreads = 2)
compareML(m2.alt, m3.alt)
# m2.alt: uV ~ s(Time, by = Correctness) + s(AoArr, by = Correctness) + 
#      ti(Time, AoArr, by = Correctness) + Correctness + s(Time, 
#      SubjectCor, bs = "fs", m = 1) + s(Time, WordCor, bs = "fs", 
#      m = 1)
# 
# m3.alt: uV ~ s(Time, by = Correctness) + s(AoArr, by = Correctness) + 
#      Correctness + s(Time, SubjectCor, bs = "fs", m = 1) + s(Time, 
#      WordCor, bs = "fs", m = 1)
# 
# Chi-square test of fREML scores
# -----
#        Model   Score Edf Chisq    Df p.value Sig.
#       m3.alt 1492233  14                         
# fREML m2.alt 1492233  20 0.465 6.000   0.988

Difference curves

par(mfrow = c(1, 2))
plot_diff(m3, view = "Time", comp = list(Correctness = c("incor", "cor")), rm.ranef = T, eegAxis = T)
plot_diff(m3, view = "AoArr", comp = list(Correctness = c("incor", "cor")), rm.ranef = T, eegAxis = T)

plot of chunk unnamed-chunk-16

Finally: model criticism

library(car)
par(mfrow = c(1, 2))
qqp(resid_gam(m3))
hist(resid_gam(m3))

plot of chunk unnamed-chunk-17

Approach to normalize residuals

  • This type of residual distribution is common for EEG data
  • These deviations are problematic and may affect \(p\)-values
  • Solution: model fitting using scaled-\(t\) distribution
    • Implemented in gam: family="scat"
    • Adding rho to gam possible via custom function (created by Natalya Pya)
    • Unfortunately, gam is too slow, so a bam approach is in development
    • Current state: implemented for fREML estimation with pre-specified scaled-\(t\) parameters (\(\theta\)) and no discrete=T (see Meulman, Wieling, Sprenger, Stowe and Schmid, PLOS ONE, 2015)
    • Thus gam still necessary to determine (\(\theta\)) (but applied to subset / simpler model specification)

Estimating scaled-\(t\) parameters efficiently

  1. Fit an intercept-only scaled-t gam model (including rho, method="REML") to all data and extract \(\theta\)
  2. Fit the complete model without random effects as a scaled-\(t\) gam model (including rho, method="REML") to all data and extract \(\theta\)
  3. Compare the models' \(\theta\) values:
    • Relatively small differences: use \(\theta\) of the complex model in the bam estimation
    • Relatively large differences: determine \(\theta\) by fittting the complete model specification (including random effects) to a small subset of the data (randomly select trials, compare \(\theta\) across a few runs)

Fitting a scaled-\(t\) model

source("bam.art.fit-v10.R")  # custom functions by Natalya Pya
g0 = gam(uV ~ 1, data = dat, method = "REML", family = scat(rho = rhoval, AR.start = dat$start.event))  # takes 6 seconds
g1 = gam(uV ~ s(Time, by = Correctness) + s(AoArr, by = Correctness) + Correctness, data = dat, method = "REML", 
    family = scat(rho = rhoval, AR.start = dat$start.event))  # takes 14 minutes
theta0 = g0$family$getTheta(TRUE)
(theta1 = g1$family$getTheta(TRUE))
# [1] 4.57 5.27
theta0 - theta1
# [1] 0.00276 0.00210
# the Gaussian model (m3) takes 230 sec. using 32 procs
system.time(m4.scat <- bam.art(uV ~ s(Time, by = Correctness) + s(AoArr, by = Correctness) + Correctness + 
    s(Time, SubjectCor, bs = "fs", m = 1) + s(Time, WordCor, bs = "fs", m = 1), data = dat, method = "fREML", 
    family = art(theta = theta1, rho = rhoval), AR.start = dat$start.event, discrete = T, nthreads = 32))
#     user   system  elapsed 
# 105784.3     23.4   3600.7

Using the scaled-\(t\) distribution: \(p\)-values change

summary(m3)$s.table
#                              edf  Ref.df     F   p-value
# s(Time):Correctnesscor      1.04    1.08 0.385  5.74e-01
# s(Time):Correctnessincor    3.32    4.32 6.765  1.25e-05
# s(AoArr):Correctnesscor     1.07    1.08 2.690  1.06e-01
# s(AoArr):Correctnessincor   1.10    1.11 3.386  5.21e-02
# s(Time,SubjectCor)        111.78 1204.00 0.650 2.32e-118
# s(Time,WordCor)           134.38 1727.00 0.275  5.97e-55
summary(m4.scat)$s.table
#                              edf  Ref.df      F   p-value
# s(Time):Correctnesscor      1.01    1.02  0.662  4.22e-01
# s(Time):Correctnessincor    4.17    5.38 14.266  1.40e-14
# s(AoArr):Correctnesscor     1.47    1.49  1.811  2.67e-01
# s(AoArr):Correctnessincor   1.00    1.00  6.872  8.76e-03
# s(Time,SubjectCor)        120.69 1202.00  1.467 1.56e-313
# s(Time,WordCor)           137.64 1726.00  0.301  6.78e-62

Using the scaled-\(t\) distribution: similar patterns

plot of chunk unnamed-chunk-19

Model criticism: much improved!

library(car)
par(mfrow = c(1, 2))
qqp(resid_gam(m3), main = "m3")
qqp(resid(m4.scat), main = "m4.scat")  # resid of scaled-t model takes rho into account

plot of chunk unnamed-chunk-20

Discussion and conclusion

  • Still work to do: e.g., testing the significance of other possibly important variables (proficiency, age, etc.)
  • But don't make it too complex!
    • There is much variation present in EEG data and adding very complex surfaces will almost certainly improve your model significantly
    • Keep it simple, otherwise you won't be able to compute/interpret the results
    • Also note that for hypothesis testing, it is essential not to conduct step-wise approaches
  • To conclude:
    • GAMs are very useful to analyze EEG and other time-series data
    • The method is very suitable to detect non-linear patterns, while taking into account individual variation and correcting for autocorrelation

Recap

  • We have applied GAMs to EEG data and learned how to:
    • model difference smooths directly using binary predictors and ordered factors
    • use te(Time,AoArr) to model a non-linear interaction
    • decompose te(Time,AoArr) using ti() and s()
    • use a scaled-\(t\) model to improve residuals
      • (fitting this type of model for a complete data set may take a few weeks of CPU time, however)
  • While we have analyzed a single region of interest, GAMs allow for spatial distribution analyses
  • After the break:
  • Next lecture: generalized additive logistic modeling using dialect data

Questions?

Thank you for your attention!

http://www.martijnwieling.nl
wieling@gmail.com