1 Introduction

During the lecture we have looked at the effect of Correctness and the effect of Age of Arrival. In this lab session we will look at another additional factor: Structure. Research question: Is there an effect of the distance between determiner and noun in the violation? We compare the DN (determiner-noun, e.g. der Garten/*das Garten) structure with the DAN (determiner-adjective-noun, e.g. das frische Gras/*der frische Gras) structure.

You can download the R markdown file here: lab4.Rmd, add your answers to this file, and generate the HTML report (see also the final section).

2 Initialization

Load the required packages and download the required files.

# test if the required packages are installed if not, install them
packages <- c("mgcv", "car", "itsadug")
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
    install.packages(setdiff(packages, rownames(installed.packages())), repos="http://cran.us.r-project.org")
}

# load the packages
library(mgcv)
library(itsadug)
library(car)

# download and load the following file containing the dataset:
download.file("http://www.let.rug.nl/wieling/statscourse/lecture4/lab/dat.rda", "dat.rda")
load("dat.rda")  # This dataset is a subset of 15 subjects

# display version information
R.version.string
packageVersion("mgcv")
packageVersion("car")
packageVersion("itsadug")

3 Data investigation

Investigate the data.

head(dat)
str(dat)
summary(dat)
dim(dat)

Note that the data needs to be sorted to correct for autocorrelation. In this case, the data was already sorted via: dat = start_event(dat, event=c("Subject","TrialNr","Roi")).

4 Effect of structure

We would like to know whether the P600 effect is dependent on the structure of the sentence (Determiner-Noun: DN vs. Determiner-Adjective-Noun: DAN). Remember that the size of the P600 is determined by the difference between incorrect and correct. Consequently, we are interested in assessing if that difference varies between the levels of Structure, and we will make a model where we look at the interaction between Correctness and Structure. Note that to prevent lengthy computations during this lab session, we will not take into account factor smooths per word.

dat$CorStruct <- interaction(dat$Correctness, dat$Structure)  # create a new variable with 4 levels
levels(dat$CorStruct)  # show levels
dat$SubjectCorStruct <- interaction(dat$SubjectCor, dat$Structure)
rhoval <- 0.93  # rho set to fixed value for this lab session

system.time(m1 <- bam(uV ~ s(Time, by = CorStruct) + CorStruct + s(Time, SubjectCorStruct, 
    bs = "fs", m = 1), data = dat, rho = rhoval, AR.start = dat$start.event, discrete = T, 
    nthreads = 2))
summary(m1)

We first visualize the smooths.

# plot the separate smooths
plot_smooth(m1, "Time", plot_all = "CorStruct", rm.ranef = T, eegAxis = T, rug = F)

And subsequently visualize the two difference curves.

# plot the differences between correct and incorrect for the two cases (DN and DAN)
par(mfrow = c(1, 2))
plot_diff(m1, "Time", comp = list(CorStruct = c("incor.DAN", "cor.DAN")), rm.ranef = T, eegAxis = T, 
    ylim = c(10, -5))
plot_diff(m1, "Time", comp = list(CorStruct = c("incor.DN", "cor.DN")), rm.ranef = T, eegAxis = T, 
    ylim = c(10, -5))

5 Modeling differences

To model the difference curves directly, we can follow two approaches. The first one uses binary curves, the second ordered factors.

5.1 Binary difference curves

For binary curves, we first create two binary predictors to mark the incorrect cases as 1.

dat$IsIncorrectDN <- (dat$Correctness == "incor" & dat$Structure == "DN") * 1
dat$IsIncorrectDAN <- (dat$Correctness == "incor" & dat$Structure == "DAN") * 1

Now we can create the binary curve model as follows:

m1b <- bam(uV ~ s(Time, by = Structure) + Structure + s(Time, by = IsIncorrectDN) + s(Time, 
    by = IsIncorrectDAN) + s(Time, SubjectCorStruct, bs = "fs", m = 1), data = dat, rho = rhoval, 
    AR.start = dat$start.event, discrete = T, nthreads = 2)
summary(m1b)

To interpret the results, visualization is necessary.

par(mfrow = c(1, 2))
plot(m1b, select = 4, shade = T, rug = F, main = "Binary difference curve DAN", ylim = c(10, 
    -5))
abline(h = 0)
plot(m1b, select = 3, shade = T, rug = F, main = "Binary difference curve DN", ylim = c(10, 
    -5))
abline(h = 0)

5.2 Ordered factors

To separate the intercept differences from the non-linear differences, we use ordered factors.

dat$IsIncorrectDNO <- as.ordered(dat$Correctness == "incor" & dat$Structure == "DN")
contrasts(dat$IsIncorrectDNO) <- "contr.treatment"
dat$IsIncorrectDANO <- as.ordered((dat$Correctness == "incor" & dat$Structure == "DAN"))
contrasts(dat$IsIncorrectDANO) <- "contr.treatment"

Subsequently, we create the ordered factor model as follows:

m1o <- bam(uV ~ s(Time, by = Structure) + Structure + s(Time, by = IsIncorrectDNO) + IsIncorrectDNO + 
    s(Time, by = IsIncorrectDANO) + IsIncorrectDANO + s(Time, SubjectCorStruct, bs = "fs", 
    m = 1), data = dat, rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 2)
summary(m1o)

To interpret the results, we visualize the difference curves. These difference smooths are centered, as the summary already contains the fixed-effect intercept differences.

par(mfrow = c(1, 2))
plot(m1o, select = 4, shade = T, rug = F, main = "Binary difference curve DAN", ylim = c(10, 
    -5))
abline(h = 0)
plot(m1o, select = 3, shade = T, rug = F, main = "Binary difference curve DN", ylim = c(10, 
    -5))
abline(h = 0)

5.3 Exercise 1

For this exercise, please assess if the distinction between DAN and DN is necessary. Rather than comparing two models, use a binary curve to assess if the difference between cor and incor is different for structure DAN than for structure DN. The reason, we are using this approach, rather than comparing two models is that fitting an ERP model with method=ML takes a very long time (due to the option discrete=T not being available), and the \(p\)-values of the model summary are more reliable than model comparison.

# your code goes here

Answer: …

6 Age of arrival effect

In the following, we will investigate how the DN vs. DAN patterns vary depending on the AoArr of the participants. Note that we only use a subset of the data here (N = 15), so there are only a few different values of AoArr.

m2 <- bam(uV ~ te(Time, AoArr, by = CorStruct) + CorStruct + s(Time, SubjectCorStruct, bs = "fs", 
    m = 1), data = dat, rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 2)
summary(m2)

And we plot the differences between correct and incorrect for the two cases:

par(mfrow = c(1, 2))
plot_diff2(m2, view = c("Time", "AoArr"), comp = list(CorStruct = c("incor.DAN", "cor.DAN")), 
    rm.ranef = T)
fadeRug(dat$Time, dat$AoArr)  # make areas lighter where there is no data
plot_diff2(m2, view = c("Time", "AoArr"), comp = list(CorStruct = c("incor.DN", "cor.DN")), 
    rm.ranef = T)
fadeRug(dat$Time, dat$AoArr)  # make areas lighter where there is no data

6.1 Binary curves

Using binary curves, we can directly assess if the difference between cor and incor is necessary for both structures.

m2b <- bam(uV ~ te(Time, AoArr, by = Structure) + Structure + te(Time, AoArr, by = IsIncorrectDN) + 
    te(Time, AoArr, by = IsIncorrectDAN) + s(Time, SubjectCorStruct, bs = "fs", m = 1), data = dat, 
    rho = rhoval, AR.start = dat$start.event, discrete = T, nthreads = 2)
summary(m2b)  # both difference surfaces are necessary

6.2 Exercise 2

For this exercise, please assess if the differences are constant (i.e. only an intercept difference) or non-linear. Hint: use ordered factors.

# your code goes here

Answer: …

6.3 Exercise 3

For this exercise, decompose the te() and assess which separate parts are necessary. As before, we do not use model comparison, but decompose the model using difference curves. Hint: as binary predictors may only occur in a model once (as otherwise the model does not know in which term to put the constant intercept difference) and the decomposition yields three terms, you need to use ordered factors.

# your code goes here

7 Replication

To generate the output of the analysis presented above, first install Pandoc. Then copy the following lines to the most recent version of R. (Or use the Knit HTML button when viewing the file lab4.Rmd in RStudio.)

# install rmarkdown package if not installed
if (!"rmarkdown" %in% rownames(installed.packages())) {
    install.packages("rmarkdown", repos="http://cran.us.r-project.org")
}
library(rmarkdown)  # load rmarkdown package

# download original file if not already exists (to prevent overwriting)
if (!file.exists("lab4.Rmd")) {
    download.file("http://www.let.rug.nl/wieling/statscourse/lecture4/lab/lab4.Rmd", "lab4.Rmd")
}

# generate output
render("lab4.Rmd")  # generates html file with results

# view output in browser
browseURL(paste("file://", file.path(getwd(), "lab4.html"), sep = ""))  # shows result

Lab 4 of Advanced Regression for Linguists

Martijn Wieling (http://www.martijnwieling.nl)

Generated on: Tue Dec 13 10:25:26 2016