In this lab session, we will look at eye tracking data (Loerts et al., 2013). Make sure you have the most recent version of R
64 bits installed.
You can download the R markdown file here: lab2.Rmd, add your answers to this file, and generate the HTML report (see also the final section).
Load the required packages and download the required files.
# test if the necessary packages are installed if not, install them
packages <- c("lme4", "car", "gplots")
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
install.packages(setdiff(packages, rownames(installed.packages())), repos="http://cran.us.r-project.org")
}
# load the packages
library(lme4)
library(car)
library(gplots)
# download and load the following files containing the dataset and some functions:
download.file("http://www.let.rug.nl/wieling/statscourse/lecture2/lab/myFunctions.R", "myFunctions.R")
download.file("http://www.let.rug.nl/wieling/statscourse/lecture2/lab/eye.rda", "eye.rda")
download.file("http://www.let.rug.nl/wieling/statscourse/lecture2/lab/eyeAll.rda", "eyeAll.rda")
source("myFunctions.R") # custom plotting functions
load("eye.rda") # data for mixed-effects regression, only SameColor == TRUE
load("eyeAll.rda") # data for making focus plots
# display version information
R.version.string
packageVersion("lme4")
packageVersion("car")
packageVersion("gplots")
Look at the data to discover its structure.
head(eye)
str(eye)
summary(eye)
dim(eye) # number of rows, columns
plot(density(eye$Age)) # e.g., to investigate distributions
In this exercise, investigate eyeAll
yourself.
# your code goes here
The following plots show the various focus plots split by gender competitor and target type (common vs. neuter).
myFocusPlot(eyeAll)
The following plots show the various focus plots split by definiteness as well.
myFocusPlot(eyeAll, splitDefiniteness = T)
With the following code we can focus on a certain time window.
myFocusPlot(eyeAll, startTime = "TimeOnsetNoun", endTime = "TimeOnsetNoun+200")
myFocusPlot(eyeAll, startTime = 500, endTime = 2000)
Finally, we can also focus on the competitor vs. distractors.
myFocusPlot(eyeAll, startTime = 500, endTime = 2000, plotTarget = F)
We create our first random effects model (no fixed-effects yet).
model <- glmer(cbind(TargetFocus, CompFocus) ~ (1 | Subject) + (1 | Item), data = eye, family = "binomial")
summary(model)
We visualize the random intercepts. In order to visualize the correct y-axis, we visualize the empirical logit (which is the same as the logit, but then a slight adaptation, so that the divisor is never equal to 0).
eye$elog <- log((eye$TargetFocus + 0.5)/(eye$CompFocus + 0.5))
myInterceptPlot(eye, "elog", "TrialID", "Subject", model) # for subjects
myInterceptPlot(eye, "elog", "TrialID", "Item", model) # for items
Now we test if a by-item analysis is necessary.
model1 <- glmer(cbind(TargetFocus, CompFocus) ~ (1 | Subject), data = eye, family = "binomial")
model2 <- glmer(cbind(TargetFocus, CompFocus) ~ (1 | Subject) + (1 | Item), data = eye, family = "binomial")
AIC(model1) - AIC(model2) # > 2, so yes
We try to add a fixed-effect factor for color, but we get a warning.
model3 <- glmer(cbind(TargetFocus, CompFocus) ~ SameColor + (1 | Subject) + (1 | Item), data = eye, family = "binomial")
This is due to the column being constant in this subset.
table(eye$SameColor)
We can, of course, add other predictors, such as the contrast between men and women IsMale
(which is 1 for a man and 0 for a woman).
model4 <- glmer(cbind(TargetFocus, CompFocus) ~ IsMale + (1 | Subject) + (1 | Item), data = eye, family = "binomial")
summary(model4)
As IsMale
is significant, we retain it as a predictor.
Why does it not make sense to vary the effect of IsMale
per Subject
as a random slope? Would it make sense to vary it per Item
instead?
Answer: ….
If it does, assess if this random slope is necessary and store it as model4b
. In the following we will continue without the random slope for simplicity.
We now investigate the effect of gender competitor and target gender
model5 <- glmer(cbind(TargetFocus, CompFocus) ~ IsMale + TargetNeuter + SameGender + (1 | Subject) +
(1 | Item), data = eye, family = "binomial")
summary(model5)
Both predictors are not significant (as in the lecture), but there may be an interaction.
model6 <- glmer(cbind(TargetFocus, CompFocus) ~ IsMale + TargetNeuter * SameGender + (1 | Subject) +
(1 | Item), data = eye, family = "binomial")
# another way to write the formula would be: cbind(TargetFocus,CompFocus) ~ IsMale + TargetNeuter +
# SameGender + TargetNeuter:SameGender
summary(model6)
Now both predictors are significant as well as their interaction. Of course, we still need to test if the interaction is really necessary.
In this exercise, please assess if the interaction is necessary by comparing model6
to model4
.
# your code goes here
Why are we not comparing model6
to model5
?
Answer: ….
We need to test if the inclusion of the interacting predictors as random slopes is necessary, as it may influence the significance of the fixed effects (in the interaction or separately).
model7 <- glmer(cbind(TargetFocus, CompFocus) ~ IsMale + TargetNeuter * SameGender + (1 + TargetNeuter *
SameGender | Subject) + (1 | Item), data = eye, family = "binomial")
summary(model7)
AIC(model6) - AIC(model7)
The latter model (model7
) is much better. It also illustrates that the pattern in your data may become non-significant when taking into account the adequate structure in your data. (Note that you’d normally add the random slopes one-by-one, but here for simplicity we added all three at the same time.)
Now add the factor TargetColor
to model6
(we ignore the random slopes in the rest of this lab session, to make computation quicker).
model8 <- glmer(cbind(TargetFocus, CompFocus) ~ IsMale + TargetNeuter * SameGender + TargetColor + (1 |
Subject) + (1 | Item), data = eye, family = "binomial")
summary(model8)
The summary suggests that we need a contrast for brown
eye$TargetBrown = (eye$TargetColor == "brown") * 1 # 1 if brown, 0 otherwise
In this exercise, we’ll add a contrast to the model. Add the contrast TargetBrown
to model (model6
) and test if this model should be preferred over the more complex model with all color contrasts (model8
). Store the model as model9
.
# your code goes here
In this exercise, we’ll assess if a random slope for TargetBrown is necessary. Assess if TargetBrown
needs to be included as a by-subject and/or by-item random slope in model9
. Store the model as model10
.
# your code goes here
If we have found our best model, we conduct model criticism. Here we use model8
as an illustration.
qqp(resid(model8)) # not normal, but also not required
The next step is to remove the data points our model has problems with to fit correctly.
eye2 <- eye[abs(scale(resid(model8))) < 2, ]
(1 - (nrow(eye2))/nrow(eye)) * 100 # percent excluded
We then refit the model:
model8.v2 <- glmer(cbind(TargetFocus, CompFocus) ~ IsMale + TargetNeuter * SameGender + TargetColor +
(1 | Subject) + (1 | Item), data = eye2, family = "binomial")
qqp(resid(model8.v2)) # residual distribution similar
While the present analysis is better than using linear mixed-effects regression modeling (i.e. no logistic regression), the analysis could be improved by takuing into account the time course via generalized additive modeling. In that case our dependent variable would be binary, indicating if the subject focused on the target or not at the specific time point.
To generate the output of the analysis presented above, first install Pandoc. Then copy the following lines to the most recent version of R. Alternatively, you can download the file, open it in R studio, and then press the button Knit HTML. (Or use the Knit HTML
button when viewing the file lab2.Rmd
in RStudio.)
# install rmarkdown package if not installed
if (!"rmarkdown" %in% rownames(installed.packages())) {
install.packages("rmarkdown", repos="http://cran.us.r-project.org")
}
library(rmarkdown) # load rmarkdown package
# download original file if not already exists (to prevent overwriting)
if (!file.exists("lab2.Rmd")) {
download.file("http://www.let.rug.nl/wieling/statscourse/lecture2/lab/lab2.Rmd", "lab2.Rmd")
}
# generate output
render("lab2.Rmd") # generates html file with results
# view output in browser
browseURL(paste("file://", file.path(getwd(), "lab2.html"), sep = "")) # shows result