1 Introduction

During the lecture we have mainly looked at the contrast between “tent” and “tenth”. Here we focus on “faith” versus “fate”.

You can download the R markdown file here: lab.Rmd, add your answers to this file, and generate the HTML report (see also the final section).

2 Initialization

Load the required packages and download the required files.

# test if the required packages are installed if not, install them
packages <- c("mgcv", "itsadug")
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
    install.packages(setdiff(packages, rownames(installed.packages())),
        repos = "http://cran.us.r-project.org")
}

# load the packages
library(mgcv)
library(itsadug)

# download and load the following file containing the dataset:
download.file("http://www.let.rug.nl/wieling/Statistics/GAM-Intro/lab/th.rda",
    "th.rda")
load("th.rda")  # articulography data
# display version information
R.version.string
packageVersion("mgcv")
packageVersion("itsadug")

3 Data investigation

Investigate the data.

head(th)
str(th)
summary(th)
dim(th)

The data needs to be sorted to correct for autocorrelation. This is done as follows.

th <- th[order(th$Participant, th$Trial, th$Time), ]
th$start.event <- th$Time == 0
head(th)

4 Comparing “faith” and “fate”

We would like to compare “faith” and “fate” in a single model. It is important to allow individual (non-linear) variation in how speakers pronounce the two words. Note that you can ignore the warning model has repeated 1-d smooths of same variable, as this simply indicates that you have smooths over time both in the random effects as in the fixed effects. Also note that the difference curve is overly conservative (i.e. has confidence bands which are wider than they should be; Sóskuthy, 2021). This lecture discusses how this can be prevented.

m1 <- bam(TTfront ~ s(Time, by = Word) + Word + s(Time, Participant, by = Word,
    bs = "fs", m = 1), data = th, discrete = T, nthreads = 2)
summary(m1)
par(mfrow = c(1, 2))
plot_smooth(m1, view = "Time", plot_all = "Word", rug = F)
plot_diff(m1, view = "Time", comp = list(Word = c("faith", "fate")))  # overly conservative

4.1 Autocorrelation

It is likely there is autocorrelation present, so we assess the amount of autocorrelation and refit the model with rho included.

m1acf <- acf_resid(m1)
(rhoval <- m1acf[2])

4.2 Exercise 1

Refit the model to correct for autocorrelation. Assess if the autocorrelation has been reduced using the ACF plot. Hint: use the rho parameter and the AR.start parameter.

# the following line should be replaced with the correct model
# formula
m1.rho <- m1

4.3 Smooth complexity

As the k values are relatively close to the maximum, we assess if we need to increase k.

gam.check(m1.rho)

While these results suggest k is adequate, we double the value as the edf value is close to the maximum of 9.

m1.alt <- bam(TTfront ~ s(Time, by = Word, k = 20) + Word + s(Time, Participant,
    by = Word, bs = "fs", m = 1), rho = rhoval, AR.start = th$start.event,
    data = th, discrete = T, nthreads = 2)
par(mfrow = c(1, 2))
plot_smooth(m1.rho, view = "Time", plot_all = "Word", rug = F, main = "m1.rho")
plot_smooth(m1.alt, view = "Time", plot_all = "Word", rug = F, main = "m1.alt")

As the patterns are highly similar, we stick with the simpler model (m1.rho).

5 Comparing the two speaker groups

In the following, we will compare the two groups EN and NL in their tongue trajectories.

5.1 Exercise 2

For this exercise, please create a GAM (m2) in which you distinguish the two words for both groups (EN and NL) separately. Visualize the patterns and the difference between “faith” and “fate” for both groups.

# your code goes here

5.2 Exercise 3

For this exercise, please evaluate if the more complex model is necessary.

# your code goes here

Answer: …

6 Final remark

You will have noticed that fitting a model with method='ML' is much slower than the default method='fREML'. This is mostly caused by discrete=T not being available for method='ML'. In the lecture which is available at https://www.let.rug.nl/wieling/Statistics/GAM-EEG, you can see how to specify a model in such a way that model comparison is done implicitly, and consequently explicit model comparison (using method='ML') is hardly necessary anymore.

7 Replication

To generate the output of the analysis presented above, first install Pandoc. Then copy the following lines to the most recent version of R. (Or use the Knit HTML button when viewing the file lab.Rmd in RStudio.)

# install rmarkdown package if not installed
if (!"rmarkdown" %in% rownames(installed.packages())) {
    install.packages("rmarkdown", repos = "http://cran.us.r-project.org")
}
library(rmarkdown)  # load rmarkdown package

# download original file if not already exists (to prevent
# overwriting)
if (!file.exists("lab.Rmd")) {
    download.file("http://www.let.rug.nl/wieling/Statistics/GAM-Intro/lab/lab.Rmd",
        "lab.Rmd")
}

# generate output
render("lab.Rmd")  # generates html file with results

# view output in browser
browseURL(paste("file://", file.path(getwd(), "lab.html"), sep = ""))  # shows result