Introduction to Generalized Additive Modeling using articulography data

Lecture 3 of advanced regression for linguists

Martijn Wieling
Department of Information Science

This lecture

Introduction
- Generalized additive modeling
- Articulography
- Using articulography to study L2 pronunciation differences
Design
Methods: R code
Results
Discussion

Generalized additive modeling (1)

Generalized additive model (GAM): relaxing assumption of linear relation between dependent variable and predictor
Relationship between individual predictors and (possibly transformed) dependent variable is estimated by a non-linear smooth function: \(g(y) = s(x_1) +s(x_2,x_3) + a_4x_4 + ...\)
- Multiple predictors can be combined in a (hyper)surface smooth (next lecture)

Generalized additive modeling (2)

Advantage of GAM over manual specification of non-linearities: the optimal shape of the non-linearity is determined automatically
Appropriate degree of smoothness is automatically determined on the basis of cross validation to prevent overfitting
Choosing a smoothing basis
- Single predictor or isotropic predictors: thin plate regression spline (this lecture)
  - Efficient approximation of the optimal (thin plate) spline
- Combining non-isotropic predictors: tensor product spline (next lecture)
Generalized Additive Mixed Modeling:
- Random effects can be treated as smooths as well (Wood, 2008)
- R: gam and bam (package mgcv)
For more (mathematical) details, see Wood (2006)

Articulography

Obtaining data

Recorded data

Study setup: differences between native and non-native English

19 native Dutch speakers from Groningen
22 native Southern Standard British English speakers from London
Material:
- 10 minimal pairs [t]:[θ] repeated twice:
  - 'fate'-'faith', 'forth'-'fort', 'kit'-'kith', 'mitt'-'myth', 'tent'-'tenth'
  - 'tank'-'thank', 'team'-'theme', 'tick'-'thick', 'ties'-'thighs', 'tongs'-'thongs'
  - Note that the sound [θ] does not exist in the Dutch language
    - The pronunciation of the words was preceded and followed by /ə/
Goal: compare distinction between this sound contrast for both groups
Preprocessing:
- Positions and time are normalized per speaker (between 0 and 1)

Data overview

load("art.rda")
head(art)

#          Word Axis Sensor Participant RecBlock    Time SeqNr Position Group Sound
# 112 @_faith_@    X     TB   VENI-EN_1        6 0.00278    21    0.842    EN    TH
# 113 @_faith_@    X     TB   VENI-EN_1        6 0.01065    21    0.844    EN    TH
# 114 @_faith_@    X     TB   VENI-EN_1        6 0.01852    21    0.843    EN    TH
# 115 @_faith_@    X     TB   VENI-EN_1        6 0.02639    21    0.842    EN    TH
# 116 @_faith_@    X     TB   VENI-EN_1        6 0.03426    21    0.842    EN    TH
# 117 @_faith_@    X     TB   VENI-EN_1        6 0.04213    21    0.839    EN    TH

dim(art)

# [1] 1083578      10

Much individual variation and noisy data

plot of chunk unnamed-chunk-3

First model: tongue frontness for "theme" and "team"

(R version 3.2.3 (2015-12-10), `mgcv` version 1.8.11, `itsadug` version 1.0.1)

library(mgcv)
library(itsadug)
tm = art[art$Word %in% c("@_theme_@", "@_team_@") & art$Sensor == "TT" & art$Axis == "X", ]
tm$TTfront = 1 - tm$Position  # Position has higher values for backness: invert to represent frontness 
tm = start_event(tm, event = c("Participant", "Word", "RecBlock", "SeqNr"))  # sort data per individual trajectory
summary(m0 <- bam(TTfront ~ s(Time), data = tm))

# 
# Family: gaussian 
# Link function: identity 
# 
# Formula:
# TTfront ~ s(Time)
# 
# Parametric coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept) 0.766302   0.000621    1234   <2e-16 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Approximate significance of smooth terms:
#          edf Ref.df   F p-value    
# s(Time) 8.73   8.98 650  <2e-16 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# R-sq.(adj) =   0.24   Deviance explained = 24.1%
# fREML = -19420  Scale est. = 0.0071077  n = 18448

Visualizing the non-linear time pattern of the model

(Interpreting GAM results always involves visualization)

plot(m0, rug = F, shade = T, main = "Partial effect", ylab = "TTfront")
plot_smooth(m0, view = "Time", rug = F, main = "Full effect")

plot of chunk unnamed-chunk-5

Check for additional wigglyness

(if p-value is low and edf close to k')

gam.check(m0)

# 
# Method: fREML   Optimizer: perf newton
# full convergence after 10 iterations.
# Gradient range [-8.9e-06,8.16e-06]
# (score -19420 & scale 0.00711).
# Hessian positive definite, eigenvalue range [3.77,9223].
# Model rank =  10 / 10 
# 
# Basis dimension (k) checking results. Low p-value (k-index<1) may
# indicate that k is too low, especially if edf is close to k'.
# 
#           k'  edf k-index p-value
# s(Time) 9.00 8.73    1.01    0.73

Increasing the wigglyness of a smooth with \(k\)

(double \(k\) if higher \(k\) is needed, but do not set it too high, i.e. max \(\frac{1}{2}\) \(\times\) unique time points)

m0b = bam(TTfront ~ s(Time, k = 20), data = tm)
summary(m0b)$s.table  # new edf

#          edf Ref.df   F p-value
# s(Time) 13.2   15.7 373       0

summary(m0)$s.table  # original edf

#          edf Ref.df   F p-value
# s(Time) 8.73   8.98 650       0

Effect of increasing \(k\)

plot_smooth(m0, view = "Time", rug = F, main = "m0 (k=10)")
plot_smooth(m0b, view = "Time", rug = F, main = "m0b (k=20)")

plot of chunk unnamed-chunk-8

Including individual variation: random intercepts

(note the increased explained variance: was 24%)

m1 = bam(TTfront ~ s(Time, k = 20) + s(Participant, bs = "re"), data = tm)
summary(m1)

# 
# Family: gaussian 
# Link function: identity 
# 
# Formula:
# TTfront ~ s(Time, k = 20) + s(Participant, bs = "re")
# 
# Parametric coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  0.76911    0.00847    90.8   <2e-16 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Approximate significance of smooth terms:
#                 edf Ref.df   F p-value    
# s(Time)        14.4   16.8 587  <2e-16 ***
# s(Participant) 39.9   40.0 314  <2e-16 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# R-sq.(adj) =  0.548   Deviance explained =   55%
# fREML = -24100  Scale est. = 0.004224  n = 18448

Effect of including a random intercept

plot_smooth(m0b, view = "Time", rug = F, main = "m0b")
plot_smooth(m1, view = "Time", rug = F, main = "m1", rm.ranef = T)

plot of chunk unnamed-chunk-10

Including individual variation: random slopes

m2 = bam(TTfront ~ s(Time, k = 20) + s(Participant, bs = "re") + s(Participant, Time, bs = "re"), data = tm)
summary(m2)

# 
# Family: gaussian 
# Link function: identity 
# 
# Formula:
# TTfront ~ s(Time, k = 20) + s(Participant, bs = "re") + s(Participant, 
#     Time, bs = "re")
# 
# Parametric coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  0.76910    0.00948    81.1   <2e-16 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Approximate significance of smooth terms:
#                      edf Ref.df     F p-value    
# s(Time)             14.6   16.9   582  <2e-16 ***
# s(Participant)      39.4   40.0 44319  <2e-16 ***
# s(Participant,Time) 39.2   40.0 37380  <2e-16 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# R-sq.(adj) =   0.59   Deviance explained = 59.2%
# fREML = -24903  Scale est. = 0.0038396  n = 18448

Effect of including a random slope

plot_smooth(m1, view = "Time", rug = F, main = "m1", rm.ranef = T)
plot_smooth(m2, view = "Time", rug = F, main = "m2", rm.ranef = T)

plot of chunk unnamed-chunk-12

Including individual variation: factor smooths

(i.e. including a non-linear random effect instead of a random intercept and random slope)

m3 = bam(TTfront ~ s(Time, k = 20) + s(Time, Participant, bs = "fs", m = 1), data = tm)
summary(m3)

# 
# Family: gaussian 
# Link function: identity 
# 
# Formula:
# TTfront ~ s(Time, k = 20) + s(Time, Participant, bs = "fs", m = 1)
# 
# Parametric coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)   0.7687     0.0116    66.4   <2e-16 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Approximate significance of smooth terms:
#                       edf Ref.df    F p-value    
# s(Time)              14.4   16.6 22.9  <2e-16 ***
# s(Time,Participant) 318.5  368.0 68.2  <2e-16 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# R-sq.(adj) =  0.678   Deviance explained = 68.4%
# fREML = -26799  Scale est. = 0.0030093  n = 18448

Visualization of individual variation

plot(m3, select = 2)

plot of chunk unnamed-chunk-14

Effect of including a random non-linear effect

plot_smooth(m2, view = "Time", rug = F, main = "m2", rm.ranef = T)
plot_smooth(m3, view = "Time", rug = F, main = "m3", rm.ranef = T)

plot of chunk unnamed-chunk-15

Explore random effects yourself!

Go to: http://eolomea.let.rug.nl/GAM/RandomEffects (login: p/s-number and RUG password, or: f112300/ShinyDem0)

Speeding up computation

(via `discrete` and `nthreads`, only for default `method="fREML"`)

system.time(bam(TTfront ~ s(Time, k = 20) + s(Time, Participant, bs = "fs", m = 1), data = tm))

#    user  system elapsed 
#    14.2     0.0    14.2

system.time(bam(TTfront ~ s(Time, k = 20) + s(Time, Participant, bs = "fs", m = 1), data = tm, discrete = T))

#    user  system elapsed 
#    2.66    0.00    2.65

system.time(bam(TTfront ~ s(Time, k = 20) + s(Time, Participant, bs = "fs", m = 1), data = tm, discrete = T, 
    nthreads = 2))

#    user  system elapsed 
#    2.73    0.00    2.15

Note that the speed improvement when using nthreads is larger for more complex models

Comparing "theme" and "team" in one model

(smooths are centered, so the factorial predictor also needs to be included in the fixed effects)

m4 = bam(TTfront ~ s(Time, by = Word, k = 20) + Word + s(Time, Participant, bs = "fs", m = 1), data = tm, 
    discrete = T, nthreads = 2)
summary(m4)$p.table  # extract fixed effects from summary

#               Estimate Std. Error t value Pr(>|t|)
# (Intercept)     0.7480   0.011637    64.3        0
# Word@_theme_@   0.0416   0.000709    58.6        0

summary(m4)$s.table  # extract non-linearities and random effects from summary

#                         edf Ref.df    F   p-value
# s(Time):Word@_team_@   13.3   15.6 16.1  6.24e-44
# s(Time):Word@_theme_@  14.5   16.7 36.2 4.02e-115
# s(Time,Participant)   327.6  368.0 89.9  0.00e+00

Does the correct-incorrect distinction improve the model?

Assessing model improvement

(comparing fixed effects, so `method='ML'` and no `discrete=T`)

m3.ml = bam(TTfront ~ s(Time, k = 20) + s(Time, Participant, bs = "fs", m = 1), data = tm, method = "ML")
m4.ml = bam(TTfront ~ s(Time, by = Word, k = 20) + Word + s(Time, Participant, bs = "fs", m = 1), data = tm, 
    method = "ML")
compareML(m3.ml, m4.ml)

# m3.ml: TTfront ~ s(Time, k = 20) + s(Time, Participant, bs = "fs", m = 1)
# 
# m4.ml: TTfront ~ s(Time, by = Word, k = 20) + Word + s(Time, Participant, 
#      bs = "fs", m = 1)
# 
# Chi-square test of ML scores
# -----
#    Model  Score Edf    Chisq    Df  p.value Sig.
#    m3.ml -26805   5                             
# ML m4.ml -29232   8 2426.205 3.000  < 2e-16  ***
# 
# AIC difference: 4971.84, model m4.ml has lower AIC.

Model m4.ml is much better (AIC decrease > 2)!

Visualizing the two patterns

plot_smooth(m4, view = "Time", rug = F, plot_all = "Word", main = "m4", rm.ranef = T)

plot of chunk unnamed-chunk-19

Visualizing the difference

(no individual variation w.r.t. the difference is taken into account: confidence bands too thin)

plot_diff(m4, view = "Time", comp = list(Word = c("@_theme_@", "@_team_@")), ylim = c(-0.02, 0.12), rm.ranef = T)

plot of chunk unnamed-chunk-20

Including individual variation for "theme" vs. "team"

tm$ParticipantWord = interaction(tm$Participant, tm$Word)  # new factor
m5 = bam(TTfront ~ s(Time, by = Word, k = 20) + Word + s(Time, ParticipantWord, bs = "fs", m = 1), data = tm, 
    discrete = T, nthreads = 2)
summary(m5)$p.table

#               Estimate Std. Error t value Pr(>|t|)
# (Intercept)     0.7486     0.0124   60.31   0.0000
# Word@_theme_@   0.0407     0.0176    2.32   0.0203

summary(m5)$s.table

#                           edf Ref.df    F  p-value
# s(Time):Word@_team_@     14.1   16.3 11.5 9.78e-31
# s(Time):Word@_theme_@    15.3   17.3 23.7 3.39e-75
# s(Time,ParticipantWord) 666.8  737.0 91.5 0.00e+00

More uncertainty in the difference

plot_diff(m4, view='Time', comp=list(Word=c('@_theme_@','@_team_@')), ylim=c(-0.05,0.15), 
    main='m4: difference', rm.ranef=T) 
plot_diff(m5, view='Time', comp=list(Word=c('@_theme_@','@_team_@')), ylim=c(-0.05,0.15), 
    main='m5: difference', rm.ranef=T)

plot of chunk unnamed-chunk-22

Autocorrelation in the data is a problem!

(residuals should be independent, otherwise the standard errors and p-values are wrong)

m5acf = acf_resid(m5)  # show autocorrelation

plot of chunk unnamed-chunk-23

Additional information: http://www.let.rug.nl/wieling/statscourse/ACF.pdf

Correcting for autocorrelation

(rhoval = m5acf[2])  # correlation of residuals at time t with those at time t-1

#     1 
# 0.956

m6 = bam(TTfront ~ s(Time, by = Word, k = 20) + Word + s(Time, ParticipantWord, bs = "fs", m = 1), data = tm, 
    rho = rhoval, AR.start = tm$start.event, discrete = T, nthreads = 2)
summary(m6)$p.table

#               Estimate Std. Error t value Pr(>|t|)
# (Intercept)     0.7485     0.0116   64.44   0.0000
# Word@_theme_@   0.0406     0.0164    2.47   0.0136

summary(m6)$s.table

#                           edf Ref.df     F  p-value
# s(Time):Word@_team_@     15.5   17.6 11.57 3.48e-33
# s(Time):Word@_theme_@    16.6   18.2 24.56 6.76e-82
# s(Time,ParticipantWord) 550.3  737.0  5.09 0.00e+00

Autocorrelation has been removed

par(mfrow = c(1, 2))
acf_resid(m5, main = "ACF of m5")
acf_resid(m6, main = "ACF of m6")

plot of chunk unnamed-chunk-25

Clear model improvement

compareML(m5.ml, m6.ml)

# m5.ml: TTfront ~ s(Time, by = Word, k = 20) + Word + s(Time, ParticipantWord, 
#      bs = "fs", m = 1)
# 
# m6.ml: TTfront ~ s(Time, by = Word, k = 20) + Word + s(Time, ParticipantWord, 
#      bs = "fs", m = 1)
# 
# Model m6.ml preferred: lower ML score (24976.885), and equal df (0.000).
# -----
#   Model  Score Edf Difference    Df
# 1 m5.ml -33287   8                 
# 2 m6.ml -58264   8 -24976.885 0.000

# Warning in compareML(m5.ml, m6.ml): AIC is not reliable, because an AR1 model is included (rho1 = 0.000000, rho2 =
# 0.955537).

Explore autocorrelation yourself!

Go to: http://eolomea.let.rug.nl/GAM/AutoCorrelation
Note that autocorrelation can be reduced in two ways:
- Creating a model with a better fit of the data
- Setting rho to a value larger than 0
- See also: http://www.let.rug.nl/wieling/statscourse/ACF.pdf

Distinguishing the two speaker groups

tm$WordGroup = interaction(tm$Word, tm$Group)
m7 = bam(TTfront ~ s(Time, by = WordGroup, k = 20) + WordGroup + s(Time, ParticipantWord, bs = "fs", 
    m = 1), data = tm, rho = rhoval, AR.start = tm$start.event, discrete = T, nthreads = 2)
summary(m7)$p.table

#                       Estimate Std. Error t value Pr(>|t|)
# (Intercept)             0.7257     0.0145   50.20 0.000000
# WordGroup@_theme_@.EN   0.0490     0.0205    2.40 0.016622
# WordGroup@_team_@.NL    0.0480     0.0212    2.26 0.023651
# WordGroup@_theme_@.NL   0.0792     0.0212    3.73 0.000196

summary(m7)$s.table

#                                 edf Ref.df     F  p-value
# s(Time):WordGroup@_team_@.EN   15.4   17.5 10.12 4.14e-28
# s(Time):WordGroup@_theme_@.EN  16.2   18.0 20.55 8.27e-67
# s(Time):WordGroup@_team_@.NL   10.1   12.8  3.84 3.66e-06
# s(Time):WordGroup@_theme_@.NL  13.6   16.2  7.83 4.91e-19
# s(Time,ParticipantWord)       530.6  737.0  4.42 0.00e+00

The difference between EN and NL is necessary

compareML(m6.ml, m7.ml)

# m6.ml: TTfront ~ s(Time, by = Word, k = 20) + Word + s(Time, ParticipantWord, 
#      bs = "fs", m = 1)
# 
# m7.ml: TTfront ~ s(Time, by = WordGroup, k = 20) + WordGroup + s(Time, 
#      ParticipantWord, bs = "fs", m = 1)
# 
# Chi-square test of ML scores
# -----
#    Model  Score Edf  Chisq    Df p.value Sig.
#    m6.ml -58264   8                          
# ML m7.ml -58274  14 10.317 6.000   0.002  **

# Warning in compareML(m6.ml, m7.ml): AIC is not reliable, because an AR1 model is included (rho1 = 0.955537, rho2 =
# 0.955537).

Visualizing the patterns

plot_smooth(m7, view = "Time", rug = F, plot_all = "WordGroup", main = "m7", rm.ranef = T)
plot_diff(m7, view = "Time", rm.ranef = T, ylim = c(-0.05, 0.2), comp = list(WordGroup = c("@_theme_@.EN", 
    "@_team_@.EN")), main = "EN difference")
plot_diff(m7, view = "Time", rm.ranef = T, ylim = c(-0.05, 0.2), comp = list(WordGroup = c("@_theme_@.NL", 
    "@_team_@.NL")), main = "NL difference")

plot of chunk unnamed-chunk-31

The full tongue tip model: all words and both axes

(`Word` is now a random-effect factor, `Sound` distinguishes T from TH words)

t1 = art[art$Sensor == "TT", ]
t1$GroupSoundAxis = interaction(t1$Group, t1$Sound, t1$Axis)
t1$WordGroupAxis = interaction(t1$Word, t1$Group, t1$Axis)
t1$PartSoundAxis = interaction(t1$Participant, t1$Group, t1$Axis)
t1 = start_event(t1, event = c("Participant", "Word", "Axis", "RecBlock", "SeqNr"))

system.time(model <- bam(Position ~ s(Time, by = GroupSoundAxis, k = 20) + GroupSoundAxis + s(Time, PartSoundAxis, 
    bs = "fs", m = 1) + s(Time, WordGroupAxis, bs = "fs", m = 1), data = t1, rho = rhoval, AR.start = t1$start.event, 
    discrete = T, nthreads = 4))  # duration discrete=F: 2100 sec., discrete=T: 690 s. (1 thr.) / 365 s. (2 thr.)

#    user  system elapsed 
# 1002.37    2.75  290.07

The full tongue tip model: all words and both axes

smry = summary(model)
smry$p.table

#                       Estimate Std. Error t value Pr(>|t|)
# (Intercept)            0.31405     0.0163  19.282 8.33e-83
# GroupSoundAxisNL.T.X  -0.06376     0.0235  -2.716 6.62e-03
# GroupSoundAxisEN.TH.X -0.04885     0.0153  -3.197 1.39e-03
# GroupSoundAxisNL.TH.X -0.07173     0.0233  -3.077 2.09e-03
# GroupSoundAxisEN.T.Z   0.05375     0.0246   2.186 2.88e-02
# GroupSoundAxisNL.T.Z   0.07058     0.0252   2.803 5.07e-03
# GroupSoundAxisEN.TH.Z -0.00487     0.0245  -0.199 8.43e-01
# GroupSoundAxisNL.TH.Z  0.02969     0.0250   1.187 2.35e-01

smry$s.table

#                                  edf Ref.df     F   p-value
# s(Time):GroupSoundAxisEN.T.X   10.05   12.3  4.02  2.13e-06
# s(Time):GroupSoundAxisNL.T.X    9.35   11.7  3.27  3.03e-04
# s(Time):GroupSoundAxisEN.TH.X  13.78   15.3  5.28  5.55e-11
# s(Time):GroupSoundAxisNL.TH.X   8.62   11.0  1.59  9.04e-02
# s(Time):GroupSoundAxisEN.T.Z   18.45   18.6 55.04 3.52e-203
# s(Time):GroupSoundAxisNL.T.Z   18.01   18.3 30.06 2.32e-104
# s(Time):GroupSoundAxisEN.TH.Z  17.83   18.1 38.66 1.13e-136
# s(Time):GroupSoundAxisNL.TH.Z  16.60   17.3 12.89  1.83e-37
# s(Time,PartSoundAxis)         691.61  735.0 50.31  0.00e+00
# s(Time,WordGroupAxis)         630.97  715.0 46.39  0.00e+00

A clear L1-based pattern

plot_diff(model, view = "Time", rm.ranef = T, ylim = c(0.1, -0.15), comp = list(GroupSoundAxis = c("EN.TH.X", 
    "EN.T.X")))
plot_diff(model, view = "Time", rm.ranef = T, ylim = c(0.1, -0.15), comp = list(GroupSoundAxis = c("NL.TH.X", 
    "NL.T.X")))

plot of chunk unnamed-chunk-33

2D visualization

plot of chunk unnamed-chunk-34

Discussion

Native English speakers appear to make the /t/-/θ/ distinction, whereas Dutch L2 speakers of English generally do not
Obviously, the model could be improved by separating minimal pairs into two categories (which?)

Recap

We have applied GAMs to articulography data and learned how to:
- use s(Time) to model a non-linearity over time
- use the k parameter to control the "wigglyness" of the non-linearity
- use the plotting functions plot_smooth and plot_diff (rm.ranef=T!)
- use the parameter setting bs="re" to add random intercepts and slopes
- add non-linear random effects using s(Time,...,bs="fs",m=1)
- use the by-parameter to obtain separate non-linearities
  - Note that the factorial predictor used for the by-part needs to be included as fixed-effect factor as well!
- use compareML to compare models (method='ML' when comparing fixed effects)
After the break:
- http://www.let.rug.nl/wieling/statscourse/lecture3/lab
Next lecture: more about generalized additive modeling using EEG data

Questions?

Thank you for your attention!

http://www.martijnwieling.nl
wieling@gmail.com

Introduction to Generalized Additive Modeling using articulography data

Lecture 3 of advanced regression for linguists

This lecture

Generalized additive modeling (1)

Generalized additive modeling (2)

Articulography

Obtaining data

Recorded data

Study setup: differences between native and non-native English

Data overview

Much individual variation and noisy data

First model: tongue frontness for "theme" and "team"

(R version 3.2.3 (2015-12-10), mgcv version 1.8.11, itsadug version 1.0.1)

Visualizing the non-linear time pattern of the model

(Interpreting GAM results always involves visualization)

Check for additional wigglyness

(if p-value is low and edf close to k')

Increasing the wigglyness of a smooth with \(k\)

(double \(k\) if higher \(k\) is needed, but do not set it too high, i.e. max \(\frac{1}{2}\) \(\times\) unique time points)

Effect of increasing \(k\)

Including individual variation: random intercepts

(note the increased explained variance: was 24%)

Effect of including a random intercept

Including individual variation: random slopes

Effect of including a random slope

Including individual variation: factor smooths

(i.e. including a non-linear random effect instead of a random intercept and random slope)

Visualization of individual variation

Effect of including a random non-linear effect

Explore random effects yourself!

Speeding up computation

(via discrete and nthreads, only for default method="fREML")

Comparing "theme" and "team" in one model

(smooths are centered, so the factorial predictor also needs to be included in the fixed effects)

Assessing model improvement

(comparing fixed effects, so method='ML' and no discrete=T)

Visualizing the two patterns

Visualizing the difference

(no individual variation w.r.t. the difference is taken into account: confidence bands too thin)

Including individual variation for "theme" vs. "team"

More uncertainty in the difference

Autocorrelation in the data is a problem!

(residuals should be independent, otherwise the standard errors and p-values are wrong)

Correcting for autocorrelation

Autocorrelation has been removed

Clear model improvement

Explore autocorrelation yourself!

Distinguishing the two speaker groups

The difference between EN and NL is necessary

Visualizing the patterns

The full tongue tip model: all words and both axes

(Word is now a random-effect factor, Sound distinguishes T from TH words)

The full tongue tip model: all words and both axes

A clear L1-based pattern

2D visualization

Discussion

Recap

Questions?

(R version 3.2.3 (2015-12-10), `mgcv` version 1.8.11, `itsadug` version 1.0.1)

(via `discrete` and `nthreads`, only for default `method="fREML"`)

(comparing fixed effects, so `method='ML'` and no `discrete=T`)

(`Word` is now a random-effect factor, `Sound` distinguishes T from TH words)