Martijn Wieling

University of Groningen

- Introduction
- Gender processing in Dutch
- Eye-tracking to reveal gender processing

- Design
- Analysis: logistic mixed-effects regression
- Conclusion

- Study's goal: assess if Dutch people use grammatical gender to anticipate upcoming words
- This study was conducted together with Hanneke Loerts and is published in the
*Journal of Psycholinguistic Research*(Loerts, Wieling and Schmid, 2012) - What is grammatical gender?
- Gender is a property of a noun
- Nouns are divided into classes: masculine, feminine, neuter, ...
- E.g.,
*hond*('dog') = common (masculine/feminine),*paard*('horse') = neuter

- The gender of a noun can be determined from the forms of elements syntactically related to it

- Gender in Dutch: 70% common, 30% neuter
- When a noun is diminutive it is always neuter (the Dutch often use diminutives!)
- Gender is unpredictable from the root noun and hard to learn

- Eye tracking reveals incremental processing of the listener during time course of speech signal
- As people tend to look at what they hear (Cooper, 1974), lexical competition can be tested

- Cohort Model (Marslen-Wilson & Welsh, 1978): competition between words is based on word-initial activation

- This can be tested using the visual world paradigm: following eye movements while participants receive auditory input to click on one of several objects on a screen

- Subjects hear: "Pick up the candy" (Tanenhaus et al., 1995)
- Fixations towards target (Candy)
*and*competitor (Candle): support for the Cohort Model

- Other models of lexical processing state that lexical competition occurs based on all acoustic input (e.g., TRACE, Shortlist, NAM)
- Does syntactic gender information restrict the possible set of lexical candidates?
- If you hear
*de*, do you focus more on*de hond*(dog) than on*het paard*(horse)? - Previous studies (e.g., Dahan et al., 2000 for French) have indicated gender information restricts the possible set of lexical candidates

- If you hear
- We will investigate if this also holds for Dutch (other gender system) via the VWP
- We analyze the data using (generalized) linear mixed-effects regression in
`R`

- 28 Dutch participants heard sentences like:
*Klik op de rode appel*('click on the red apple')*Klik op het plaatje met een blauw boek*('click on the image of a blue book')- They were shown 4 nouns varying in color and gender
- Eye movements were tracked with a Tobii eye-tracker (E-Prime extensions)

- Subjects were shown 96 different screens
- 48 screens for indefinite sentences ("
*Klik op het plaatje met een rode appel*.") - 48 screens for definite sentences ("
*Klik op de rode appel.*")

- Difficulty 1: choosing the dependent variable
- Fixation difference between target and competitor
- Fixation proportion on target: requires transformation to empirical logit, to ensure the dependent variable is unbounded: \(\log( \frac{(y + 0.5)}{(N - y + 0.5)} )\)
- Logistic regression comparing fixations on target versus competitor

- Difficulty 2: selecting a time span to average over
- Note that about 200 ms. is needed to plan and launch an eye movement
- It is possible (and better) to take every individual sampling point into account, but we will opt for the simpler approach here (in contrast to the GAM approach)

- Here we use logistic mixed-effects regression comparing fixations on the target versus the competitor
- Averaged over the time span starting 200 ms. after the onset of the determiner and ending 200 ms. after the onset of the noun (about 800 ms.)
- This ensures that gender information has been heard and processed, both for the definite and indefinite sentences

- A generalized linear (mixed-effects) regression model (GLM) is a generalization of linear (mixed-effects) regression model
- Response variables may have an error distribution different than the norm. dist.
- Linear model is related to response variable via link function
- Variance of measurements may depend on the predicted value

- Examples of GLMs are Poisson regression,
**logistic regression**, etc.

- Dependent variable is binary (1: success, 0: failure): modeled as probabilities
- Transform to continuous variable via log odds link function: \(\log(\frac{p}{1-p}) = \textrm{logit}(p)\)
- In
`R`

:`logit(p)`

(from library`car`

)

- In
- Interpret coefficients w.r.t. success as logits (in
`R`

:`plogis(x)`

)

- Independent observations within each level of the random-effect factor
- Relation between logit-transformed DV and independent variables linear
- No strong multicollinearity
- No highly influential outliers (i.e. assessed using model criticism)
**Important**: No normality or homoscedasticity assumptions about the residuals

- Check pairwise correlations of your predictor variables
- If high: exclude variable / combine variables (residualization is not OK)
- See also: Chapter 6.2.2 of Baayen (2008)

- Check distribution of numerical predictors
- If skewed, it may help to transform them

- Center your numerical predictors when doing mixed-effects regression

- Variable of interest:
- Competitor gender vs. target gender

- Variables which are/could be important:
**Competitor vs. target color**- Gender of target (common or neuter)
- Definiteness of target

- Participant-related variables:
- Gender (male/female), age, education level
- Trial number

- Design control variables:
- Competitor position vs. target position (up-down or down-up)
- Color of target
- (anything else you are not interested in, but potentially problematic)

```
head(eye)
```

```
# Subject Item TargetDefinite TargetNeuter TargetColor TargetPlace CompColor
# 1 S300 boom 1 0 green 3 brown
# 2 S300 bloem 1 0 red 4 green
# 3 S300 anker 1 1 yellow 3 yellow
# 4 S300 auto 1 0 green 3 brown
# 5 S300 boek 1 1 blue 4 blue
# 6 S300 varken 1 1 brown 1 green
# CompPlace TrialID Age IsMale Edulevel SameColor SameGender TargetFocus CompFocus
# 1 2 1 52 0 1 0 1 43 41
# 2 2 2 52 0 1 0 0 100 0
# 3 2 3 52 0 1 1 1 73 27
# 4 2 4 52 0 1 0 0 100 0
# 5 3 5 52 0 1 1 0 12 21
# 6 3 6 52 0 1 0 0 0 51
```

`lme4`

version 1.1.27)```
library(lme4)
model1 <- glmer(cbind(TargetFocus, CompFocus) ~ (1 | Subject) + (1 | Item), data = eye,
family = "binomial") # intercept-only model
summary(model1) # slides only show relevant part of the summary
```

```
# Random effects:
# Groups Name Std.Dev.
# Item (Intercept) 0.326
# Subject (Intercept) 0.588
#
# Fixed effects:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 0.848 0.121 7.01 2.31e-12 ***
```

```
fixef(model1) # show fixed effects
```

```
# (Intercept)
# 0.848
```

```
plogis(fixef(model1)["(Intercept)"])
```

```
# (Intercept)
# 0.7
```

- On average 70% chance to focus on target