1 Introduction

In this lab session, we will look at lexical decision data discussed by Baayen in (Baayen, 2008, Ch. 7.1). Make sure you have the most recent version of R 64 bits installed.

You can download the R markdown file here: lab1.Rmd (make sure to not download it as a .txt file, but rather as a .Rmd file), add your answers to this file, and generate the HTML report (see for details the final section).

2 Initialization

Load the required packages and download the required files

# test if the necessary packages are installed if not, install them
packages <- c("lme4", "car", "boot", "multcomp", "lsmeans")
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
    install.packages(setdiff(packages, rownames(installed.packages())), repos = "http://cran.us.r-project.org")
}

# load the packages
library(lme4)
library(car)
library(boot)
library(multcomp)
library(lsmeans)

# download and load the following files containing the dataset and some functions:
download.file("http://www.let.rug.nl/wieling/statscourse/lecture1/lab/myFunctions.R", "myFunctions.R")
download.file("http://www.let.rug.nl/wieling/statscourse/lecture1/lab/lexdec.rda", "lexdec.rda")
source("myFunctions.R")
load("lexdec.rda")
# display version information
R.version.string
packageVersion("lme4")
packageVersion("car")
packageVersion("boot")
packageVersion("multcomp")
packageVersion("lsmeans")

3 Data investigation

Look at the data to discover its structure.

head(lexdec)
str(lexdec)
summary(lexdec)
nrow(lexdec)
plot(density(lexdec$Frequency))

# frequency normally shows a skewed distribution, and was therefore log-transformed this was the original
# distribution:
plot(density(exp(lexdec$Frequency)))

3.1 Data cleaning

Here we remove impossible RTs (< 200ms. or > 1100ms.) and we only focus on correct responses.

lexdec2 <- lexdec[exp(lexdec$RT) > 200 & exp(lexdec$RT) < 1100, ]
nrow(lexdec2)/nrow(lexdec)  # proportion of data retained
lexdec3 <- lexdec2[lexdec2$Correct == "correct", ]

4 Linear regression

We create our first linear regression model (no mixed-effects regression yet) and we inspect the summary. Note that Trial is not significant in this model.

linearModel <- lm(RT ~ Trial, data = lexdec3)
summary(linearModel)

5 Random intercepts

Since we want our results to be generalizable across words and subjects, we add random intercepts for these factors.

5.1 Random intercepts for subject

We start with a random intercept for Subject: (1|Subject).

mixedModel1 <- lmer(RT ~ Trial + (1 | Subject), data = lexdec3)
summary(mixedModel1, cor = F)

Looking at the results, we see that the p-values are not present anymore (compared to the linear regression), but we can use \(|t| \geq 1.65 \implies p < 0.05\) (one-tailed) and \(|t| \geq 2 \implies p < 0.05\) (two-tailed) for a large data set.

The explained variance of the model can be found as follows:

cor(fitted(mixedModel1), lexdec3$RT)^2

# explained variance of fixed effects only:
cor(fitted(linearModel), lexdec3$RT)^2

Clearly most of the variance is due to the random intercepts. For this reason the explained variance of a mixed-effects regression model is not a useful measure of how well the model performs.

To see how the intercepts for subjects vary, we can plot them. This custom function extracts the random effects via ranef.

myInterceptPlot(lexdec, "RT", "Trial", "Subject", mixedModel1)

5.2 Exercise 1

In this exercise, we’ll include random intercepts for word. Create a new model mixedModel2 which additionally includes random intercepts per word. Show the summary of the model, plot the random intercepts and evaluate if the by-word random intercepts are necessary by comparing the two models via AIC comparisons and anova.

# your code goes here

Note that the p-value of anova is too conservative when testing for inclusion of random effects. In the worst case the p-value from anova is twice as high as it should be. See http://http://lme4.r-forge.r-project.org/lMMwR/lrgprt.pdf (p. 46).

5.3 Adding a fixed effect

mixedModel3 <- lmer(RT ~ Trial + NativeLanguage + (1 | Subject) + (1 | Word), data = lexdec3)
summary(mixedModel3, cor = F)  # NativeLanguage is significant, people with the Other native language are slower

To assess if the fixed-effect factor improves the model, we can use AIC comparisons or anova. Note that the REML parameter needs to be set to FALSE when comparing fixed effects, whereas it needs to remain at its default value of TRUE when comparing random effects and for the final model fit. While significant parameters are generally indicative of a significant improvement, the comparison is certainly necessary when including multi-level factors or interactions.

mixedModel2b <- lmer(RT ~ Trial + (1 | Subject) + (1 | Word), data = lexdec3, REML = F)
mixedModel3b <- lmer(RT ~ Trial + NativeLanguage + (1 | Subject) + (1 | Word), data = lexdec3, REML = F)
AIC(mixedModel2b) - AIC(mixedModel3b)
anova(mixedModel2b, mixedModel3b, refit = F)

5.4 Exercise 2

In this exercise, we’ll add another fixed-effect predictor. Add Frequency as a fixed-effect predictor to the model and assess if the model improves using model comparison (hint: REML=F for model comparison). Store the model as mixedModel4b.

# your code goes here

6 Random slopes

We now want to see if subjects become slower or faster during the course of the experiment: include a random slope for Trial. Note that it is important to center Trial! Also note that if you see warning messages, you should read carefully.

lexdec3$cTrial <- scale(lexdec3$Trial, scale = F)  # centering
mixedModel4c <- lmer(RT ~ cTrial + NativeLanguage + Frequency + (1 | Subject) + (0 + cTrial | Subject) + (1 | Word), 
    data = lexdec3)

The model gives a warning that the model is nearly unidentifiable and recommends rescaling the predictors. We follow its recommendation for trial as this variable caused the problem:

lexdec3$zTrial <- scale(lexdec3$Trial)  # z-transforming (centering, then dividing by standard deviation)
mixedModel4c <- lmer(RT ~ zTrial + NativeLanguage + Frequency + (1 | Subject) + (0 + zTrial | Subject) + (1 | Word), 
    data = lexdec3)

We now test if subjects show a different response depending on word frequency: include a random slope for Frequency per subject. For simplicity we continue building from mixedModel4.

lexdec3$cFrequency <- scale(lexdec3$Frequency, scale = F)  # centering
mixedModel5 <- lmer(RT ~ Trial + NativeLanguage + cFrequency + (1 | Subject) + (0 + cFrequency | Subject) + (1 | 
    Word), data = lexdec3)
summary(mixedModel5, cor = F)

Now we compare mixedModel5 to mixedModel5 (but refitted to include the centered frequency measure.

mixedModel4 <- lmer(RT ~ Trial + NativeLanguage + cFrequency + (1 | Subject) + (1 | Word), data = lexdec3)
AIC(mixedModel4) - AIC(mixedModel5)

6.1 Exercise 3

In this exercise, we’ll test for random-effect correlation parameters. Test for a correlation parameter between the random slope for frequency and the random intercept (i.e. test if slower subjects respond in a different way to increasing frequency than faster subjects)

# your code goes here

Important remark: factorial/binary predictors used as random slope ALWAYS need to be put in the same block (i.e. correlated) with the random intercept (as they are contrasted w.r.t. the intercept).

6.2 Plotting random slopes

We can plot the random slopes (based on p. 249 of Baayen, 2008) as follows:

myXYPlot(lexdec3, "RT", "cFrequency", "Subject", mixedModel5)

Similarly, we can plot the sorted coefficients - from high negative (subjects are very fast for higher frequency words) to less negative (subjects are less fast for higher frequency words). Compare both plots. The first shows the lines, the second only the slope of the line.

myCoefPlot(mixedModel5, "Subject", "cFrequency")

6.3 Convergence issues

Now we test if the effect of native language varies per word. Since this is a factorial predictor, it always needs to be put in the model together with the random intercept in one block (note this will not converge)

mixedModel6 <- lmer(RT ~ Trial + NativeLanguage + cFrequency + (1 | Subject) + (0 + cFrequency | Subject) + (1 + 
    NativeLanguage | Word), data = lexdec3)
summary(mixedModel6, cor = F)  # the summary also shows evidence it has not converged

If your model does not converge, you can try with different optimizers, to see if one of those does find a solution. The default optimizer is bobyqa. Another is Nelder_Mead which we try now and does converge.

mixedModel6a1 <- lmer(RT ~ Trial + NativeLanguage + cFrequency + (1 | Subject) + (0 + cFrequency | Subject) + (1 + 
    NativeLanguage | Word), data = lexdec3, control = lmerControl(optimizer = "Nelder_Mead"))

A further optimizer is the one of the old lme4 package (version 0.9), which also converges.

mixedModel6a2 <- lmer(RT ~ Trial + NativeLanguage + cFrequency + (1 | Subject) + (0 + cFrequency | Subject) + (1 + 
    NativeLanguage | Word), data = lexdec3, control = lmerControl(optimizer = "nloptwrap"))

As the two models converged, their summary and AIC is effectively identical

summary(mixedModel6a1, cor = F)
summary(mixedModel6a2, cor = F)
AIC(mixedModel6a1) - AIC(mixedModel6a2)

If the model would not have converged with any of the optimizers, the next step would be to test the correlated random slope and intercept, and if this also would not have converged, the final step would be to exclude the random slope completely (i.e. stick with mixedModel5).

Now we can assess if the random slope is necessary.

AIC(mixedModel5) - AIC(mixedModel6a1)

And we can visualize the random slopes. Do you also see an interesting pattern?

myCoefPlot(mixedModel6a1, "Word", "NativeLanguageOther")

7 Testing interactions

We saw that non-native speakers showed a specific pattern in their responses to a word. Perhaps this is related to word class, so we add this interaction and see if this reaches significance and also influences the random slopes.

mixedModel6b <- lmer(RT ~ Trial + NativeLanguage + cFrequency + (1 | Subject) + (0 + cFrequency | Subject) + (1 + 
    NativeLanguage | Word), data = lexdec3, control = lmerControl(optimizer = "Nelder_Mead"), REML = F)
mixedModel7b <- lmer(RT ~ Trial + NativeLanguage * Class + cFrequency + (1 | Subject) + (0 + cFrequency | Subject) + 
    (1 + NativeLanguage | Word), data = lexdec3, control = lmerControl(optimizer = "Nelder_Mead"), REML = F)
AIC(mixedModel6b) - AIC(mixedModel7b)  # mixedModel7b is an improvement
summary(mixedModel7b)  # interaction clearly significant

Perhaps the random slope of NativeLanguage per word is not necessary anymore? As we are comparing random effects, we drop REML=F.

mixedModel7 <- lmer(RT ~ Trial + NativeLanguage * Class + cFrequency + (1 | Subject) + (0 + cFrequency | Subject) + 
    (1 + NativeLanguage | Word), data = lexdec3, control = lmerControl(optimizer = "Nelder_Mead"))
mixedModel7simpler <- lmer(RT ~ Trial + NativeLanguage * Class + cFrequency + (1 | Subject) + (0 + cFrequency | 
    Subject) + (1 | Word), data = lexdec3, control = lmerControl(optimizer = "Nelder_Mead"))
AIC(mixedModel7simpler) - AIC(mixedModel7)
# < 2, so the simpler model is better (i.e. the more complex model is not better)

8 Model criticism

The quantile-quantile-plot shows some problems at the tails, so model criticism is needed.

qqp(resid(mixedModel7simpler))

We also should investigate heteroscedasticity. In this plot no clear increase or decrease in variance should be visible (at least within the two solid lines).

plot(scale(resid(mixedModel7simpler)) ~ fitted(mixedModel7simpler), pch = ".", cex = 2, ylim = c(-5, 5))
abline(h = c(-2.5, 2.5))

Now we employ model criticism by removing the data points which are problematic for the model.

lexdec3b <- lexdec3[abs(scale(resid(mixedModel7simpler))) < 2.5, ]
(1 - (nrow(lexdec3b))/nrow(lexdec3)) * 100  # 2.37% removed
mixedModel7simpler.v2 <- lmer(RT ~ Trial + NativeLanguage * Class + cFrequency + (1 | Subject) + (0 + cFrequency | 
    Subject) + (1 | Word), data = lexdec3b)

The quantile-quantile plot and the heteroscedasticity plot both look much better.

qqp(resid(mixedModel7simpler.v2))
plot(scale(resid(mixedModel7simpler.v2)) ~ fitted(mixedModel7simpler.v2), pch = ".", cex = 2, ylim = c(-5, 5))
abline(h = c(-2.5, 2.5))

9 Bootstrapping

To get confidence bands for the estimates, we can conduct a bootstrapping analysis as follows.

library(boot)
(bs <- confint(mixedModel7simpler.v2, method = "boot", nsim = 100, level = 0.95, .progress = "none", PBargs = list(style = 3)))

10 Multiple comparison

To conduct multiple comparisons (i.e. comparing multiple factor levels) the library multcomp can be used as follows:

library(multcomp)
lexdec3$Factor <- interaction(lexdec3$Complex, lexdec3$Class)
model <- lmer(RT ~ Trial + Factor + (1 | Subject) + (1 | Word), data = lexdec3)
summary(model)
summary(glht(model, linfct = mcp(Factor = "Tukey")))  # no significant differences among the levels

Alternatively, we could ask if complex differs from simplex for the two different classes (animals vs. plants). We can use the function lsmeans for this

library(lsmeans)
model <- lmer(RT ~ Trial + Complex * Class + (1 | Subject) + (1 | Word), data = lexdec3)
lsmeans(model, list(pairwise ~ Complex | Class))

11 Concluding remarks

  • AIC comparisons are only valid when the models have been fitted on the same dataset. So if you add a variable with missing values, this is not the case. When you want to see if the addition of this variable is necessary, you need to fit the simpler model also on the dataset where the observations with missing values for the new variable have been excluded. A simple test to see if no data points have been dropped when adding a new variable is: (length(fitted(mixedModel1)) - length(fitted(mixedModel2))) == 0. If the response is TRUE the models have been fitted on the same number of data points.
  • The explained variance of your model will be very high due to the random-effects structure, so optimizing your explained variance is not useful. Instead you should compare AIC values. A decrease of at least 2 AIC units indicates an improved model.
  • Look at the graphical results of your model. They may reveal patterns suggesting the inclusion of several control variables
  • Pay close attention to Baayen (2008: Ch 6) on how to interpret interactions
  • If you are interested, take a look at http://www.let.rug.nl/wieling/statscourse/lecture1/lab/myFunctions.R
  • If you have a dependent variable Y which is binary (1: success, 0: failure), you can follow the same approach as before, but now using logistic regression. The commands are very similar: you use glmer instead of lmer and specify family='binomial': glmer(Y ~ ..., family='binomial')
    • Note that if you need to change the optimizer, use control = glmerControl(...)
    • There is no REML option for binomial models, so you always fit the same type of model.
    • Residuals do not need to be normally distributed anymore, so model criticism is not necessary.
    • Alternatively, for logistic regression, you may also specify your dependent variable as the number of successes and the number of failures (e.g., over a certain time span and stored in columns NrSuccesses and NrFailures) in the following way: glmer(cbind(NrSuccesses,NrFailures) ~ ..., family='binomial')

12 Replication

To generate the output of the analysis presented above, first install Pandoc. Then copy the following lines to the most recent version of R. (Or use the Knit HTML button when viewing the file lab1.Rmd in RStudio.) Note that if you get a Pandoc conversion error in Mac OSX, you can try adding self_contained: no just below the line html_document: at the top of this file.

# install rmarkdown package if not installed
if (!"rmarkdown" %in% rownames(installed.packages())) {
    install.packages("rmarkdown", repos = "http://cran.us.r-project.org")
}
library(rmarkdown)  # load rmarkdown package

# download original file if not already exists (to prevent overwriting)
if (!file.exists("lab1.Rmd")) {
    download.file("http://www.let.rug.nl/wieling/statscourse/lecture1/lab/lab1.Rmd", "lab1.Rmd")
}

# generate output
render("lab1.Rmd")  # generates html file with results

# view output in browser
browseURL(paste("file://", file.path(getwd(), "lab1.html"), sep = ""))  # shows result