Logistic mixed-effects regression

Martijn Wieling
University of Groningen

This lecture

  • Introduction
    • Gender processing in Dutch
    • Eye-tracking to reveal gender processing
  • Design
  • Analysis: logistic mixed-effects regression
  • Conclusion

Gender processing in Dutch

  • Study's goal: assess if Dutch people use grammatical gender to anticipate upcoming words
  • This study was conducted together with Hanneke Loerts and is published in the Journal of Psycholinguistic Research (Loerts, Wieling and Schmid, 2012)
  • What is grammatical gender?
    • Gender is a property of a noun
    • Nouns are divided into classes: masculine, feminine, neuter, ...
    • E.g., hond ('dog') = common (masculine/feminine), paard ('horse') = neuter
  • The gender of a noun can be determined from the forms of elements syntactically related to it

Gender in Dutch

  • Gender in Dutch: 70% common, 30% neuter
  • When a noun is diminutive it is always neuter (the Dutch often use diminutives!)
  • Gender is unpredictable from the root noun and hard to learn

Why use eye tracking?

  • Eye tracking reveals incremental processing of the listener during time course of speech signal
  • As people tend to look at what they hear (Cooper, 1974), lexical competition can be tested

Testing lexical competition using eye tracking

  • This can be tested using the visual world paradigm: following eye movements while participants receive auditory input to click on one of several objects on a screen

Support for the Cohort Model

  • Subjects hear: "Pick up the candy" (Tanenhaus et al., 1995)
  • Fixations towards target (Candy) and competitor (Candle): support for the Cohort Model

Lexical competition based on syntactic gender

  • Other models of lexical processing state that lexical competition occurs based on all acoustic input (e.g., TRACE, Shortlist, NAM)
  • Does syntactic gender information restrict the possible set of lexical candidates?
    • If you hear de, do you focus more on de hond (dog) than on het paard (horse)?
    • Previous studies (e.g., Dahan et al., 2000 for French) have indicated gender information restricts the possible set of lexical candidates
  • We will investigate if this also holds for Dutch (other gender system) via the VWP
  • We analyze the data using (generalized) linear mixed-effects regression in R

Experimental design

  • 28 Dutch participants heard sentences like:
  • Klik op de rode appel ('click on the red apple')
  • Klik op het plaatje met een blauw boek ('click on the image of a blue book')
  • They were shown 4 nouns varying in color and gender
  • Eye movements were tracked with a Tobii eye-tracker (E-Prime extensions)

Experimental design: conditions

  • Subjects were shown 96 different screens
  • 48 screens for indefinite sentences ("Klik op het plaatje met een rode appel.")
  • 48 screens for definite sentences ("Klik op de rode appel.")

Visualizing fixation proportions: different color

Visualizing fixation proportions: same color

Which dependent variable? (1)

  • Difficulty 1: choosing the dependent variable
    • Fixation difference between target and competitor
    • Fixation proportion on target: requires transformation to empirical logit, to ensure the dependent variable is unbounded: \(\log( \frac{(y + 0.5)}{(N - y + 0.5)} )\)
    • Logistic regression comparing fixations on target versus competitor
  • Difficulty 2: selecting a time span to average over
    • Note that about 200 ms. is needed to plan and launch an eye movement
    • It is possible (and better) to take every individual sampling point into account, but we will opt for the simpler approach here (in contrast to the GAM approach)

Question 1

Which dependent variable? (2)

  • Here we use logistic mixed-effects regression comparing fixations on the target versus the competitor
  • Averaged over the time span starting 200 ms. after the onset of the determiner and ending 200 ms. after the onset of the noun (about 800 ms.)
  • This ensures that gender information has been heard and processed, both for the definite and indefinite sentences

Generalized linear mixed-effects regression

  • A generalized linear (mixed-effects) regression model (GLM) is a generalization of linear (mixed-effects) regression model
    • Response variables may have an error distribution different than the norm. dist.
    • Linear model is related to response variable via link function
    • Variance of measurements may depend on the predicted value
  • Examples of GLMs are Poisson regression, logistic regression, etc.

Logistic (mixed-effects) regression

  • Dependent variable is binary (1: success, 0: failure): modeled as probabilities
  • Transform to continuous variable via log odds link function: \(\log(\frac{p}{1-p}) = \textrm{logit}(p)\)
    • In R: logit(p) (from library car)
  • Interpret coefficients w.r.t. success as logits (in R: plogis(x)) plot of chunk unnamed-chunk-1

Logistic mixed-effects regression: assumptions

  • Independent observations within each level of the random-effect factor
  • Relation between logit-transformed DV and independent variables linear
  • No strong multicollinearity
  • No highly influential outliers (i.e. assessed using model criticism)
  • Important: No normality or homoscedasticity assumptions about the residuals

Some remarks about data preparation

  • Check pairwise correlations of your predictor variables
    • If high: exclude variable / combine variables (residualization is not OK)
    • See also: Chapter 6.2.2 of Baayen (2008)
  • Check distribution of numerical predictors
    • If skewed, it may help to transform them
  • Center your numerical predictors when doing mixed-effects regression

Our study: independent variables (1)

  • Variable of interest:
    • Competitor gender vs. target gender
  • Variables which are/could be important:
    • Competitor vs. target color
    • Gender of target (common or neuter)
    • Definiteness of target

Our study: independent variables (2)

  • Participant-related variables:
    • Gender (male/female), age, education level
    • Trial number
  • Design control variables:
    • Competitor position vs. target position (up-down or down-up)
    • Color of target
    • (anything else you are not interested in, but potentially problematic)

Question 2

Dataset

head(eye)
#   Subject   Item TargetDefinite TargetNeuter TargetColor TargetPlace CompColor
# 1    S300   boom              1            0       green           3     brown
# 2    S300  bloem              1            0         red           4     green
# 3    S300  anker              1            1      yellow           3    yellow
# 4    S300   auto              1            0       green           3     brown
# 5    S300   boek              1            1        blue           4      blue
# 6    S300 varken              1            1       brown           1     green
#   CompPlace TrialID Age IsMale Edulevel SameColor SameGender TargetFocus CompFocus
# 1         2       1  52      0        1         0          1          43        41
# 2         2       2  52      0        1         0          0         100         0
# 3         2       3  52      0        1         1          1          73        27
# 4         2       4  52      0        1         0          0         100         0
# 5         3       5  52      0        1         1          0          12        21
# 6         3       6  52      0        1         0          0           0        51

Our first generalized mixed-effects regression model

(R version 4.0.5 (2021-03-31), lme4 version 1.1.27)

library(lme4)
model1 <- glmer(cbind(TargetFocus, CompFocus) ~ (1 | Subject) + (1 | Item), data = eye, 
    family = "binomial")  # intercept-only model
summary(model1)  # slides only show relevant part of the summary
# Random effects:
#  Groups  Name        Std.Dev.
#  Item    (Intercept) 0.326   
#  Subject (Intercept) 0.588   
# 
# Fixed effects:
#             Estimate Std. Error z value Pr(>|z|)    
# (Intercept)    0.848      0.121    7.01 2.31e-12 ***

Interpreting logit coefficients I

fixef(model1)  # show fixed effects
# (Intercept) 
#       0.848
plogis(fixef(model1)["(Intercept)"])
# (Intercept) 
#         0.7
  • On average 70% chance to focus on target

plot of chunk unnamed-chunk-7

By-item random intercepts