Martijn Wieling

University of Groningen

- Introduction
- Generalized additive modeling
- Articulography
- Using articulography to study L2 pronunciation differences

- Design
- Methods:
`R`

code - Results
- Discussion

*Generalized additive model (GAM)*: relaxing assumption of linear relation between dependent variable and predictor- Relationship between individual predictors and (possibly transformed) dependent variable is estimated by a non-linear smooth function: \(g(y) = s(x_1) +s(x_2,x_3) + \beta_4x_4 + ...\)
- Multiple predictors can be combined in a (hyper)surface smooth (other lecture)

- Advantage of GAM over manual specification of non-linearities: the optimal shape of the non-linearity is determined automatically
- Appropriate degree of smoothness is automatically determined by minimizing combined error
*and*"wigglyness" (no overfitting) - Maximum number of basis functions limits the maximum amount of non-linearity

- Choosing a smoothing basis
- Single predictor or isotropic predictors: thin plate regression spline (this lecture)
- Efficient approximation of the optimal (thin plate) spline

- Combining non-isotropic predictors: tensor product spline

- Single predictor or isotropic predictors: thin plate regression spline (this lecture)
- Generalized Additive Mixed Modeling:
- Random effects can be treated as smooths as well (Wood, 2008)
`R`

:`gam`

and`bam`

(package`mgcv`

)

- For more (mathematical) details, see Wood (2006) and Wood (2017)

- 19 native Dutch speakers from Groningen
- 22 native Standard Southern British English speakers from London
- Material: 10 minimal pairs [t]:[θ] repeated twice:
- 'tent'-'tenth', 'fate'-'faith', 'forth'-'fort', 'kit'-'kith', 'mitt'-'myth'
- 'tank'-'thank', 'team'-'theme', 'tick'-'thick', 'ties'-'thighs', 'tongs'-'thongs'
- Note that the sound [θ] does not exist in the Dutch language

- Goal: compare distinction between this sound contrast for both groups
- Preprocessing:
- Articulatory segmentation: gestural onset to offset (within /ə/ context)
- Positions \(z\)-transformed per axis and time normalized (from 0 to 1) per speaker

```
load("full.rda") # only includes anterior-posterior position of tongue tip sensor
head(full)
```

```
# Participant Group Word Sound Loc Trial Time TTfront
# 1 VENI_EN_1 EN tick T Init 1 0.0000 -0.392
# 2 VENI_EN_1 EN tick T Init 1 0.0161 -0.440
# 3 VENI_EN_1 EN tick T Init 1 0.0323 -0.440
# 4 VENI_EN_1 EN tick T Init 1 0.0484 -0.503
# 5 VENI_EN_1 EN tick T Init 1 0.0645 -0.513
# 6 VENI_EN_1 EN tick T Init 1 0.0806 -0.677
```

```
dim(full)
```

```
# [1] 126177 8
```

`mgcv`

version 1.8.24, `itsadug`

version 2.3)```
library(mgcv)
library(itsadug)
tt <- full[full$Word %in% c("tenth", "tent"), ]
# sort data per individual trajectory (necessary to detect autocorrelation)
tt <- start_event(tt, event = c("Participant", "Trial"))
# fit first GAM
m0 <- bam(TTfront ~ s(Time), data = tt)
```

```
summary(m0)
```

```
#
# Family: gaussian
# Link function: identity
#
# Formula:
# TTfront ~ s(Time)
#
# Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.40281 0.00825 48.8 <2e-16 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Approximate significance of smooth terms:
# edf Ref.df F p-value
# s(Time) 8.4 8.9 170 <2e-16 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# R-sq.(adj) = 0.106 Deviance explained = 10.6%
# fREML = 17375 Scale est. = 0.8739 n = 12839
```

```
plot(m0, rug = F, shade = T, main = "Partial effect", ylab = "TTfront", ylim = c(-1, 1))
plot_smooth(m0, view = "Time", rug = F, main = "Full effect", ylim = c(-1, 1))
```

```
gam.check(m0)
```

```
#
# Method: fREML Optimizer: perf newton
# full convergence after 9 iterations.
# Gradient range [-1.25e-07,1.02e-07]
# (score 17375 & scale 0.874).
# Hessian positive definite, eigenvalue range [3.46,6419].
# Model rank = 10 / 10
#
# Basis dimension (k) checking results. Low p-value (k-index<1) may
# indicate that k is too low, especially if edf is close to k'.
#
# k' edf k-index p-value
# s(Time) 9.0 8.4 1.02 0.84
```

```
m0b <- bam(TTfront ~ s(Time, k = 20), data = tt)
summary(m0b)$s.table # extract smooths from summary: new edf
```

```
# edf Ref.df F p-value
# s(Time) 11.2 13.5 112 1.91e-304
```

```
summary(m0)$s.table # original edf
```

```
# edf Ref.df F p-value
# s(Time) 8.4 8.9 170 1.24e-310
```

```
plot_smooth(m0, view = "Time", rug = F, main = "m0 (k=10)")
plot_smooth(m0b, view = "Time", rug = F, main = "m0b (k=20)")
```

```
m1 <- bam(TTfront ~ s(Time) + s(Participant, bs = "re"), data = tt)
summary(m1) # starting from here, slides only show the relevant part of the summary
```

```
# Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.432 0.0648 6.67 2.75e-11 ***
#
# Approximate significance of smooth terms:
# edf Ref.df F p-value
# s(Time) 8.49 8.93 206.1 <2e-16 ***
# s(Participant) 40.41 41.00 65.6 <2e-16 ***
#
# Deviance explained = 26.3%
```

- Deviance explained of model without random intercept (\(r^2\)): 10.6%

```
plot_smooth(m0, view = "Time", rug = F, main = "m0", ylim = c(-0.5, 1.5))
plot_smooth(m1, view = "Time", rug = F, main = "m1", ylim = c(-0.5, 1.5), rm.ranef = T)
```

```
m2 <- bam(TTfront ~ s(Time) + s(Participant, bs = "re") + s(Participant, Time, bs = "re"),
data = tt)
summary(m2)
```

```
# Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.432 0.106 4.08 4.5e-05 ***
#
# Approximate significance of smooth terms:
# edf Ref.df F p-value
# s(Time) 8.53 8.94 163 <2e-16 ***
# s(Participant) 39.91 41.00 9294 <2e-16 ***
# s(Participant,Time) 39.12 41.00 10292 <2e-16 ***
#
# Deviance explained = 31.7%
```

```
plot_smooth(m1, view = "Time", rug = F, rm.ranef = T, main = "m1", ylim = c(-0.5, 1.5))
plot_smooth(m2, view = "Time", rug = F, rm.ranef = T, main = "m2", ylim = c(-0.5, 1.5))
```

`(1|Subject)`

:`s(Subject, bs="re")`

`(0 + WF|Subject)`

:`s(Subject, WF, bs="re")`

`(1 + WF|Subject)`

:**not**possible in`mgcv`

```
m3 <- bam(TTfront ~ s(Time) + s(Time, Participant, bs = "fs", m = 1), data = tt)
summary(m3)
```

```
# Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.414 0.0802 5.16 2.54e-07 ***
#
# Approximate significance of smooth terms:
# edf Ref.df F p-value
# s(Time) 8.15 8.52 24 <2e-16 ***
# s(Time,Participant) 287.25 377.00 16 <2e-16 ***
#
# Deviance explained = 40.6%
```

```
plot(m3, select = 2)
```

```
plot_smooth(m2, view = "Time", rug = F, main = "m2", ylim = c(-0.5, 1.2), rm.ranef = T)
plot_smooth(m3, view = "Time", rug = F, main = "m3", ylim = c(-0.5, 1.2), rm.ranef = T)
```

`f112300`

and `ShinyDem0`

)`discrete`

and `nthreads`

, only for default `method="fREML"`

)```
system.time(bam(TTfront ~ s(Time) + s(Time, Participant, bs = "fs", m = 1), data = tt))
```

```
# user system elapsed
# 20.04 32.52 4.44
```

```
system.time(bam(TTfront ~ s(Time) + s(Time, Participant, bs = "fs", m = 1), data = tt,
discrete = T))
```

```
# user system elapsed
# 7.06 12.21 2.18
```

```
system.time(bam(TTfront ~ s(Time) + s(Time, Participant, bs = "fs", m = 1), data = tt,
discrete = T, nthreads = 2))
```

```
# user system elapsed
# 5.33 6.90 2.33
```

- Speed improvement with
`nthreads`

is larger for more complex models

```
m4 <- bam(TTfront ~ s(Time, by = Word) + Word + s(Time, Participant, bs = "fs", m = 1),
data = tt, discrete = T, nthreads = 2)
summary(m4)
```

```
# Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.072 0.0833 0.86 0.388
# Wordtenth 0.690 0.0118 58.66 <2e-16 ***
#
# Approximate significance of smooth terms:
# edf Ref.df F p-value
# s(Time):Wordtent 7.28 7.93 9.37 1.09e-12 ***
# s(Time):Wordtenth 8.51 8.79 43.37 <2e-16 ***
# s(Time,Participant) 306.18 377.00 22.54 <2e-16 ***
```

- Does the word distinction improve the model?

`method='ML'`

and no `discrete=T`

)```
m3.ml <- bam(TTfront ~ s(Time) + s(Time, Participant, bs = "fs", m = 1), data = tt, method = "ML")
m4.ml <- bam(TTfront ~ s(Time, by = Word) + Word + s(Time, Participant, bs = "fs", m = 1),
data = tt, method = "ML")
compareML(m3.ml, m4.ml) # model m4.ml is much better!
```

```
# m3.ml: TTfront ~ s(Time) + s(Time, Participant, bs = "fs", m = 1)
#
# m4.ml: TTfront ~ s(Time, by = Word) + Word + s(Time, Participant, bs = "fs",
# m = 1)
#
# Chi-square test of ML scores
# -----
# Model Score Edf Difference Df p.value Sig.
# 1 m3.ml 15292 5
# 2 m4.ml 13298 8 1993.465 3.000 < 2e-16 ***
#
# AIC difference: 4105.39, model m4.ml has lower AIC.
```

```
plot_smooth(m4, view = "Time", rug = F, plot_all = "Word", main = "m4", ylim = c(-0.5,
2), rm.ranef = T)
```

```
plot_diff(m4, view = "Time", comp = list(Word = c("tenth", "tent")), ylim = c(-0.5, 2),
rm.ranef = T)
```

```
m5 <- bam(TTfront ~ s(Time, by = Word) + Word + s(Time, Participant, by = Word, bs = "fs",
m = 1), data = tt, discrete = T, nthreads = 2)
summary(m5)
```

```
# Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.091 0.0973 0.94 0.349
# Wordtenth 0.598 0.1431 4.18 2.92e-05 ***
#
# Approximate significance of smooth terms:
# edf Ref.df F p-value
# s(Time):Wordtent 7.56 7.97 9.62 4.01e-13 ***
# s(Time):Wordtenth 8.40 8.56 22.61 <2e-16 ***
# s(Time,Participant):Wordtent 315.77 377.00 38.05 <2e-16 ***
# s(Time,Participant):Wordtenth 327.33 368.00 43.14 <2e-16 ***
```

```
plot_diff(m4, view="Time", comp=list(Word=c("tenth","tent")), ylim=c(-0.5,2),
main="m4: difference", rm.ranef=T)
plot_diff(m5, view="Time", comp=list(Word=c("tenth","tent")), ylim=c(-0.5,2),
main="m5: difference", rm.ranef=T)
```

```
m5acf <- acf_resid(m5) # show autocorr.
```

- See also supplementary material

```
rhoval <- m5acf[2] # correlation of residuals at time t with those at time t-1 (about 0.93)
m6 <- bam(TTfront ~ s(Time, by = Word) + Word + s(Time, Participant, by = Word, bs = "fs",
m = 1), data = tt, rho = rhoval, AR.start = tt$start.event, discrete = T, nthreads = 2)
summary(m6)
```

```
# Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.0855 0.0878 0.97 0.330
# Wordtenth 0.5905 0.1294 4.56 5.05e-06 ***
#
# Approximate significance of smooth terms:
# edf Ref.df F p-value
# s(Time):Wordtent 7.41 8.08 8.89 2.15e-12 ***
# s(Time):Wordtenth 8.32 8.60 22.08 <2e-16 ***
# s(Time,Participant):Wordtent 228.84 377.00 2.87 <2e-16 ***
# s(Time,Participant):Wordtenth 267.20 368.00 3.82 <2e-16 ***
```

```
par(mfrow = c(1, 2))
acf_resid(m5, main = "ACF of m5")
acf_resid(m6, main = "ACF of m6")
```

```
compareML(m5.ml, m6.ml)
```

```
# m5.ml: TTfront ~ s(Time, by = Word) + Word + s(Time, Participant, by = Word,
# bs = "fs", m = 1)
#
# m6.ml: TTfront ~ s(Time, by = Word) + Word + s(Time, Participant, by = Word,
# bs = "fs", m = 1)
#
# Model m6.ml preferred: lower ML score (12683.931), and equal df (0.000).
# -----
# Model Score Edf Difference Df
# 1 m5.ml 9393 10
# 2 m6.ml -3291 10 -12683.931 0.000
#
# AIC difference: 24357.55, model m6.ml has lower AIC.
```

`f112300`

and `ShinyDem0`

)```
tt$WordGroup <- interaction(tt$Word, tt$Group)
m7 <- bam(TTfront ~ s(Time, by = WordGroup) + WordGroup + s(Time, Participant, by = Word, bs = "fs",
m = 1), data = tt, rho = rhoval, AR.start = tt$start.event, discrete = T, nthreads = 2)
summary(m7)
```

```
# Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -0.0955 0.119 -0.80 0.423
# WordGrouptenth.EN 0.7452 0.174 4.27 1.94e-05 ***
# WordGrouptent.NL 0.3731 0.174 2.14 0.032 *
# WordGrouptenth.NL 0.8047 0.182 4.42 1e-05 ***
#
# Approximate significance of smooth terms:
# edf Ref.df F p-value
# s(Time):WordGrouptent.EN 3.86 4.56 2.30 0.043 *
# s(Time):WordGrouptenth.EN 7.94 8.36 16.08 <2e-16 ***
# s(Time):WordGrouptent.NL 7.46 8.13 10.86 9.74e-16 ***
# s(Time):WordGrouptenth.NL 7.74 8.21 12.04 <2e-16 ***
# s(Time,Participant):Wordtent 217.47 376.00 2.66 <2e-16 ***
# s(Time,Participant):Wordtenth 254.62 367.00 3.46 <2e-16 ***
```

```
plot_smooth(m7, view = "Time", rug = F, plot_all = "WordGroup", main = "m7", rm.ranef = T, ylim = c(-1, 2))
plot_diff(m7, view = "Time", rm.ranef = T, comp = list(WordGroup = c("tenth.EN", "tent.EN")), main = "EN difference",
ylim = c(-1, 2))
plot_diff(m7, view = "Time", rm.ranef = T, comp = list(WordGroup = c("tenth.NL", "tent.NL")), main = "NL difference",
ylim = c(-1, 2))
```

```
# m6.ml: TTfront ~ s(Time, by = Word) + Word + s(Time, Participant, by = Word,
# bs = "fs", m = 1)
#
# m7.ml: TTfront ~ s(Time, by = WordGroup) + WordGroup + s(Time, Participant,
# by = Word, bs = "fs", m = 1)
#
# Chi-square test of ML scores
# -----
# Model Score Edf Difference Df p.value Sig.
# 1 m6.ml -3291 10
# 2 m7.ml -3306 16 14.985 6.000 3.982e-05 ***
#
# AIC difference: 7.61, model m7.ml has lower AIC.
```

`Word`

is now a random-effect factor, `Sound`

distinguishes T from TH words)```
full$SoundGroup <- interaction(full$Sound, full$Group)
full <- start_event(full, event = c("Participant", "Trial"))
# duration discrete=F: 105 sec., discrete=T: 54 s. (1 thr.) / 37 s. (2 thr.)
system.time(model <- bam(TTfront ~ s(Time, by = SoundGroup) + SoundGroup + s(Time, Participant,
by = Sound, bs = "fs", m = 1) + s(Time, Word, by = Group, bs = "fs", m = 1), data = full,
rho = rhoval, AR.start = full$start.event, discrete = T, nthreads = 4))
```

```
# user system elapsed
# 74.8 17.1 21.6
```

```
summary(model, re.test = FALSE) # exclude random effects from summary (much faster)
```

```
# Parametric coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -0.0602 0.0969 -0.62 0.535
# SoundGroupTH.EN 0.6397 0.1581 4.05 5.2e-05 ***
# SoundGroupT.NL 0.1052 0.1461 0.72 0.472
# SoundGroupTH.NL 0.4340 0.1642 2.64 0.008 **
#
# Approximate significance of smooth terms:
# edf Ref.df F p-value
# s(Time):SoundGroupT.EN 5.16 5.49 2.96 0.014 *
# s(Time):SoundGroupTH.EN 6.61 6.86 6.54 1.39e-07 ***
# s(Time):SoundGroupT.NL 7.25 7.44 4.38 2.68e-05 ***
# s(Time):SoundGroupTH.NL 6.90 7.12 5.88 1.9e-06 ***
#
# Deviance explained = 53.5%
```

```
plot_diff(model, view = 'Time', rm.ranef = T, ylim=c(-0.5, 1.5),
comp = list(SoundGroup = c('TH.EN', 'T.EN')))
plot_diff(model, view = 'Time', rm.ranef = T, ylim=c(-0.5, 1.5),
comp = list(SoundGroup = c('TH.NL', 'T.NL')))
```

- Native English speakers appear to make the /t/-/θ/ distinction, whereas Dutch L2 speakers of English generally do not
- But this does
**not**mean the difference between these distinctions is significant (see this tutorial)

- But this does

- For an example of how to report this type of analysis, see:
- Wieling (2018),
*Journal of Phonetics*: GAM tutorial (paper package: data and code) - Wieling et al. (2016),
*Journal of Phonetics*(paper package: data and code)

- Wieling (2018),

- We have applied GAMs to articulography data and learned how to:
- use
`s(Time)`

to model a non-linearity over time - use the
`k`

parameter to control the number of basis functions - use the plotting functions
`plot_smooth`

and`plot_diff`

(`rm.ranef=T`

!) - use the parameter setting
`bs="re"`

to add random intercepts and slopes - add non-linear random effects using
`s(Time,...,bs="fs",m=1)`

- use the
`by`

-parameter to obtain separate non-linearities- Note that this factorial predictor also needs to be included in fixed effects!

- use
`compareML`

to compare models (comparing fixed effects:`method='ML'`

)

- use
- Associated lab session:

Thank you for your attention!