1 Introduction

In this lab session, we will look at eye tracking data (Loerts et al., 2013). Make sure you have the most recent version of R 64 bits installed.

You can download the R markdown file here: lab2.Rmd, add your answers to this file, and generate the HTML report (see also the final section).

2 Initialization

Load the required packages and download the required files.

# test if the necessary packages are installed if not, install them
packages <- c("lme4", "car", "gplots")
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
    install.packages(setdiff(packages, rownames(installed.packages())), repos="http://cran.us.r-project.org")
}

# load the packages
library(lme4)
library(car)
library(gplots)

# download and load the following files containing the dataset and some functions:
download.file("http://www.let.rug.nl/wieling/statscourse/lecture2/lab/myFunctions.R", "myFunctions.R")
download.file("http://www.let.rug.nl/wieling/statscourse/lecture2/lab/eye.rda", "eye.rda")
download.file("http://www.let.rug.nl/wieling/statscourse/lecture2/lab/eyeAll.rda", "eyeAll.rda")
source("myFunctions.R")  # custom plotting functions
load("eye.rda")  # data for mixed-effects regression, only SameColor == TRUE
load("eyeAll.rda")  # data for making focus plots

# display version information
R.version.string
packageVersion("lme4")
packageVersion("car")
packageVersion("gplots")

3 Data investigation

Look at the data to discover its structure.

head(eye)
str(eye)
summary(eye)
dim(eye)  # number of rows, columns
plot(density(eye$Age))  # e.g., to investigate distributions

3.1 Exercise 1

In this exercise, investigate eyeAll yourself.

# your code goes here

3.2 Investigating the focus plots

The following plots show the various focus plots split by gender competitor and target type (common vs. neuter).

myFocusPlot(eyeAll)

The following plots show the various focus plots split by definiteness as well.

myFocusPlot(eyeAll, splitDefiniteness = T)

With the following code we can focus on a certain time window.

myFocusPlot(eyeAll, startTime = "TimeOnsetNoun", endTime = "TimeOnsetNoun+200")
myFocusPlot(eyeAll, startTime = 500, endTime = 2000)

Finally, we can also focus on the competitor vs. distractors.

myFocusPlot(eyeAll, startTime = 500, endTime = 2000, plotTarget = F)

4 Mixed-effects regression

We create our first random effects model (no fixed-effects yet).

model <- glmer(cbind(TargetFocus, CompFocus) ~ (1 | Subject) + (1 | Item), data = eye, family = "binomial")
summary(model)

We visualize the random intercepts. In order to visualize the correct y-axis, we visualize the empirical logit (which is the same as the logit, but then a slight adaptation, so that the divisor is never equal to 0).

eye$elog <- log((eye$TargetFocus + 0.5)/(eye$CompFocus + 0.5))
myInterceptPlot(eye, "elog", "TrialID", "Subject", model)  # for subjects
myInterceptPlot(eye, "elog", "TrialID", "Item", model)  # for items

Now we test if a by-item analysis is necessary.

model1 <- glmer(cbind(TargetFocus, CompFocus) ~ (1 | Subject), data = eye, family = "binomial")
model2 <- glmer(cbind(TargetFocus, CompFocus) ~ (1 | Subject) + (1 | Item), data = eye, family = "binomial")
AIC(model1) - AIC(model2)  # > 2, so yes

4.1 Adding a fixed-effect factor

We try to add a fixed-effect factor for color, but we get a warning.

model3 <- glmer(cbind(TargetFocus, CompFocus) ~ SameColor + (1 | Subject) + (1 | Item), data = eye, family = "binomial")

This is due to the column being constant in this subset.

table(eye$SameColor)

We can, of course, add other predictors, such as the contrast between men and women IsMale (which is 1 for a man and 0 for a woman).

model4 <- glmer(cbind(TargetFocus, CompFocus) ~ IsMale + (1 | Subject) + (1 | Item), data = eye, family = "binomial")
summary(model4)

As IsMale is significant, we retain it as a predictor.

4.2 Exercise 2

Why does it not make sense to vary the effect of IsMale per Subject as a random slope? Would it make sense to vary it per Item instead?

Answer: ….

If it does, assess if this random slope is necessary and store it as model4b. In the following we will continue without the random slope for simplicity.

4.3 Adding the predictors of interest

We now investigate the effect of gender competitor and target gender

model5 <- glmer(cbind(TargetFocus, CompFocus) ~ IsMale + TargetNeuter + SameGender + (1 | Subject) + 
    (1 | Item), data = eye, family = "binomial")
summary(model5)

Both predictors are not significant (as in the lecture), but there may be an interaction.

model6 <- glmer(cbind(TargetFocus, CompFocus) ~ IsMale + TargetNeuter * SameGender + (1 | Subject) + 
    (1 | Item), data = eye, family = "binomial")
# another way to write the formula would be: cbind(TargetFocus,CompFocus) ~ IsMale + TargetNeuter +
# SameGender + TargetNeuter:SameGender
summary(model6)

Now both predictors are significant as well as their interaction. Of course, we still need to test if the interaction is really necessary.

4.4 Exercise 3

In this exercise, please assess if the interaction is necessary by comparing model6 to model4.

# your code goes here

Why are we not comparing model6 to model5?

Answer: ….

4.5 Including random slopes

We need to test if the inclusion of the interacting predictors as random slopes is necessary, as it may influence the significance of the fixed effects (in the interaction or separately).

model7 <- glmer(cbind(TargetFocus, CompFocus) ~ IsMale + TargetNeuter * SameGender + (1 + TargetNeuter * 
    SameGender | Subject) + (1 | Item), data = eye, family = "binomial")
summary(model7)
AIC(model6) - AIC(model7)

The latter model (model7) is much better. It also illustrates that the pattern in your data may become non-significant when taking into account the adequate structure in your data. (Note that you’d normally add the random slopes one-by-one, but here for simplicity we added all three at the same time.)

4.6 Adding the color of the target

Now add the factor TargetColor to model6 (we ignore the random slopes in the rest of this lab session, to make computation quicker).

model8 <- glmer(cbind(TargetFocus, CompFocus) ~ IsMale + TargetNeuter * SameGender + TargetColor + (1 | 
    Subject) + (1 | Item), data = eye, family = "binomial")
summary(model8)

The summary suggests that we need a contrast for brown

eye$TargetBrown = (eye$TargetColor == "brown") * 1  # 1 if brown, 0 otherwise

4.7 Exercise 4

In this exercise, we’ll add a contrast to the model. Add the contrast TargetBrown to model (model6) and test if this model should be preferred over the more complex model with all color contrasts (model8). Store the model as model9.

# your code goes here

4.8 Exercise 5

In this exercise, we’ll assess if a random slope for TargetBrown is necessary. Assess if TargetBrown needs to be included as a by-subject and/or by-item random slope in model9. Store the model as model10.

# your code goes here

5 Model criticism

If we have found our best model, we conduct model criticism. Here we use model8 as an illustration.

qqp(resid(model8))  # not normal, but also not required

The next step is to remove the data points our model has problems with to fit correctly.

eye2 <- eye[abs(scale(resid(model8))) < 2, ]
(1 - (nrow(eye2))/nrow(eye)) * 100  # percent excluded

We then refit the model:

model8.v2 <- glmer(cbind(TargetFocus, CompFocus) ~ IsMale + TargetNeuter * SameGender + TargetColor + 
    (1 | Subject) + (1 | Item), data = eye2, family = "binomial")
qqp(resid(model8.v2))  # residual distribution similar

While the present analysis is better than using linear mixed-effects regression modeling (i.e. no logistic regression), the analysis could be improved by takuing into account the time course via generalized additive modeling. In that case our dependent variable would be binary, indicating if the subject focused on the target or not at the specific time point.

6 Replication

To generate the output of the analysis presented above, first install Pandoc. Then copy the following lines to the most recent version of R. Alternatively, you can download the file, open it in R studio, and then press the button Knit HTML. (Or use the Knit HTML button when viewing the file lab2.Rmd in RStudio.)

# install rmarkdown package if not installed
if (!"rmarkdown" %in% rownames(installed.packages())) {
    install.packages("rmarkdown", repos="http://cran.us.r-project.org")
}
library(rmarkdown)  # load rmarkdown package

# download original file if not already exists (to prevent overwriting)
if (!file.exists("lab2.Rmd")) {
    download.file("http://www.let.rug.nl/wieling/statscourse/lecture2/lab/lab2.Rmd", "lab2.Rmd")
}

# generate output
render("lab2.Rmd")  # generates html file with results

# view output in browser
browseURL(paste("file://", file.path(getwd(), "lab2.html"), sep = ""))  # shows result

Lab 2 of Advanced Regression for Linguists

Martijn Wieling (http://www.martijnwieling.nl)

Generated on: Tue Dec 13 10:28:18 2016