1 Abstract

In this study, we investigate differences between native English speakers and the English pronunciation of Dutch and German speakers. We focus on the articulatory trajectories obtained using electromagnetic articulography and particularly investigate two sound contrasts: /t/-/θ/ and /s/-/ʃ/. Our results show that while German speakers make both sound contrasts adequately, the Dutch speakers do not distinguish them clearly. To further evaluate these results, both a human Dutch listener as well as an automatic speech recognition (ASR) system classified the pronounced words on the basis of the acoustic recording. Both classifications lined up with the articulatory results. For Dutch speakers, /θ/-words (and /s/-words) were more frequently recognized as /t/-words (and /ʃ/-words). However, the intended utterance was still recognized in the majority of cases for the Dutch speakers. The perceptual results therefore do not support a complete merger of the sounds in Dutch.

Journal: Submitted to Proceedings of ISSP 2017Submitted

Preprint: http://www.martijnwieling.nl/files/ISSP-Wieling.pdf

Keywords: Generalized additive modeling; Tutorial; Articulography; Second language acquisition

## Generated on: August 30, 2017 - 23:39:25

2 Libraries and functions

The following commands load the necessary functions and libraries and show the version information.

# install packages if not yet installed
packages <- c("mgcv","itsadug","lme4")
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

# load required packages
library(mgcv)
library(itsadug)
library(lme4)

# version information
R.version.string

## [1] "R version 3.4.1 (2017-06-30)"

cat(paste('mgcv version:',packageVersion('mgcv')))

## mgcv version: 1.8.18

cat(paste('itsadug version:',packageVersion('itsadug')))

## itsadug version: 2.2.4

cat(paste('lme4 version:',packageVersion('lme4')))

## lme4 version: 1.1.12

3 Production datasets

The following shows the columns of the full dataset and their explanation.

if (!file.exists('datth.rda')) { 
    download.file('http://www.let.rug.nl/wieling/ISSP2017/datth.rda', 'datth.rda')
}
if (!file.exists('datsh.rda')) { 
    download.file('http://www.let.rug.nl/wieling/ISSP2017/datsh.rda', 'datsh.rda')
}
load('datth.rda')
load('datsh.rda')

3.1 Column names

The dataset datsh consists of 265599 rows and 10 columns, whereas the dataset datth consists of 223954 rows and 10 columns. Both datasets have the following column names:

colnames(datth)

##  [1] "Speaker" "Lang"    "Sensor"  "Axis"    "Trial"   "Word"    "Sound"  
##  [8] "Loc"     "Time"    "Pos"

3.2 Data description

Speaker – ID of the speaker
Lang – Native language of the speaker ("NL" for Dutch, "DE" for German, or "EN" for English)
Sensor – The sensor (in this case only `“TT”, the tongue tip sensor)
Axis – The Axis (in this case only "X", the anterior-posterior position)
Trial – The trial number of the word
Word – The label of the word
Sound – The sound contrast ("TH" for words with the dental fricative, "T" for words with the stop ; or "SH" for words with the post-alveolar fricative, and "S" for words with the alveolar fricative in dataset datsh)
Loc – The location where in the word the sound contrasts occurs ("START" when it occurs at the beginning of the word or "END" when it occurs at the back of the word
Time – The normalized (between 0: beginning of the word, to 1: end of the word)
Pos – The standardized (mean 0, standard deviation 1) position for each speaker of the T1 sensor in the anterior-posterior direction (higher values, more anterior)

4 Contrasting /θ/ and /t/ in production

datth <- start_event(datth,event=c("Speaker","Trial"))

datth$LangLoc <- interaction(datth$Lang, datth$Loc)

datth$IsENTHStart <- (datth$Lang == "EN" & datth$Sound == "TH" & datth$Loc == "Start")*1

datth$IsNLTHStart <- (datth$Lang == "NL" & datth$Sound == "TH" & datth$Loc == "Start")*1

datth$IsDETHStart <- (datth$Lang == "DE" & datth$Sound == "TH" & datth$Loc == "Start")*1

datth$IsENTHEnd <- (datth$Lang == "EN" & datth$Sound == "TH" & datth$Loc == "End")*1

datth$IsNLTHEnd <- (datth$Lang == "NL" & datth$Sound == "TH" & datth$Loc == "End")*1

datth$IsDETHEnd <- (datth$Lang == "DE" & datth$Sound == "TH" & datth$Loc == "End")*1

datth$SpeakerSoundLoc <- interaction(datth$Speaker, datth$Sound, datth$Loc)

system.time(th1 <- bam(Pos ~ LangLoc + s(Time,by=LangLoc) + s(Time,by=IsENTHStart) + s(Time,by=IsENTHEnd) + s(Time,by=IsNLTHStart) + s(Time,by=IsNLTHEnd) + s(Time,by=IsDETHStart) + s(Time,by=IsDETHEnd) + s(Time,SpeakerSoundLoc,bs="fs",m=1) + s(Time,Word,bs="fs",m=1), data=datth, discrete=TRUE, rho=0.999, nthreads=8, AR.start=datth$start.event))

## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has
## repeated 1-d smooths of same variable.

##    user  system elapsed 
## 872.532   6.548 150.889

acf_resid(th1)

(smryth1 <- summary(th1))

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ LangLoc + s(Time, by = LangLoc) + s(Time, by = IsENTHStart) + 
##     s(Time, by = IsENTHEnd) + s(Time, by = IsNLTHStart) + s(Time, 
##     by = IsNLTHEnd) + s(Time, by = IsDETHStart) + s(Time, by = IsDETHEnd) + 
##     s(Time, SpeakerSoundLoc, bs = "fs", m = 1) + s(Time, Word, 
##     bs = "fs", m = 1)
## 
## Parametric coefficients:
##                 Estimate Std. Error t value Pr(>|t|)   
## (Intercept)      -0.2872     0.1038  -2.768  0.00564 **
## LangLocEN.End     0.1297     0.1701   0.762  0.44601   
## LangLocNL.End     0.3559     0.1563   2.277  0.02280 * 
## LangLocDE.Start   0.2498     0.1676   1.491  0.13597   
## LangLocEN.Start   0.1647     0.1790   0.920  0.35751   
## LangLocNL.Start   0.2462     0.1773   1.389  0.16493   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                              edf   Ref.df       F  p-value    
## s(Time):LangLocDE.End      1.002    1.003   7.464  0.00623 ** 
## s(Time):LangLocEN.End      3.017    3.368   3.357  0.03084 *  
## s(Time):LangLocNL.End      4.531    4.872   2.160  0.10285    
## s(Time):LangLocDE.Start    7.318    7.474   4.999 7.34e-06 ***
## s(Time):LangLocEN.Start    6.906    7.096   2.609  0.00530 ** 
## s(Time):LangLocNL.Start    6.835    7.019   1.431  0.23196    
## s(Time):IsENTHStart        7.338    7.592   8.109 1.04e-10 ***
## s(Time):IsENTHEnd          6.053    6.354   2.200  0.05283 .  
## s(Time):IsNLTHStart        4.724    5.054   1.249  0.23463    
## s(Time):IsNLTHEnd          6.228    6.526   0.529  0.83424    
## s(Time):IsDETHStart        8.601    8.755  25.994  < 2e-16 ***
## s(Time):IsDETHEnd          8.558    8.715  17.439  < 2e-16 ***
## s(Time,SpeakerSoundLoc) 2037.213 2481.000  15.669  < 2e-16 ***
## s(Time,Word)             150.959  176.000 133.483  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.468   Deviance explained = 47.3%
## fREML = -2.1637e+05  Scale est. = 3.7864    n = 223954

par(mfrow=c(3,2),mar=c(5.1, 5.1, 4.1, 2.1))
plot(th1,select=7,shade=T,rug=F, ylim=c(-0.6,2.1), main='TH vs T: English (start)', cex.lab=2.5, cex.axis=2.5, cex.main=2.5, cex.sub=2.5, ylab='Position difference')
abline(h=0)
plot(th1,select=8,shade=T,rug=F, ylim=c(-0.6,2.1), main='TH vs T: English (end)', cex.lab=2.5, cex.axis=2.5, cex.main=2.5, cex.sub=2.5, ylab='Position difference')
abline(h=0)
plot(th1,select=11,shade=T,rug=F, ylim=c(-0.6,2.1), main='TH vs T: German (start)', cex.lab=2.5, cex.axis=2.5, cex.main=2.5, cex.sub=2.5, ylab='Position difference')
abline(h=0)
plot(th1,select=12,shade=T,rug=F, ylim=c(-0.6,2.1), main='TH vs T: German (end)', cex.lab=2.5, cex.axis=2.5, cex.main=2.5, cex.sub=2.5, ylab='Position difference')
abline(h=0)
plot(th1,select=9,shade=T,rug=F, ylim=c(-0.6,2.1), main='TH vs T: Dutch (start)', cex.lab=2.5, cex.axis=2.5, cex.main=2.5, cex.sub=2.5, ylab='Position difference')
abline(h=0)
plot(th1,select=10,shade=T,rug=F, ylim=c(-0.6,2.1), main='TH vs T: Dutch (end)', cex.lab=2.5, cex.axis=2.5, cex.main=2.5, cex.sub=2.5, ylab='Position difference')
abline(h=0)

5 Contrasting /ʃ/ and /s/ in production

datsh <- start_event(datsh, event=c("Speaker","Trial"))

datsh$LangLoc <- interaction(datsh$Lang, datsh$Loc)

datsh$IsENSHStart <- (datsh$Lang == "EN" & datsh$Sound == "SH" & datsh$Loc == "Start")*1

datsh$IsNLSHStart <- (datsh$Lang == "NL" & datsh$Sound == "SH" & datsh$Loc == "Start")*1

datsh$IsDESHStart <- (datsh$Lang == "DE" & datsh$Sound == "SH" & datsh$Loc == "Start")*1

datsh$IsENSHEnd <- (datsh$Lang == "EN" & datsh$Sound == "SH" & datsh$Loc == "End")*1

datsh$IsNLSHEnd <- (datsh$Lang == "NL" & datsh$Sound == "SH" & datsh$Loc == "End")*1

datsh$IsDESHEnd <- (datsh$Lang == "DE" & datsh$Sound == "SH" & datsh$Loc == "End")*1

datsh$SpeakerSoundLoc <- interaction(datsh$Speaker, datsh$Sound, datsh$Loc)

system.time(sh1 <- bam(Pos ~ LangLoc + s(Time,by=LangLoc) + s(Time,by=IsENSHStart) + s(Time,by=IsENSHEnd) + s(Time,by=IsNLSHStart) + s(Time,by=IsNLSHEnd) + s(Time,by=IsDESHStart) + s(Time,by=IsDESHEnd) + s(Time,SpeakerSoundLoc,bs="fs",m=1) + s(Time,Word,bs="fs",m=1), data=datsh, discrete=TRUE, rho=0.999, , nthreads=8, AR.start=datsh$start.event))

## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has
## repeated 1-d smooths of same variable.

##     user   system  elapsed 
## 1239.356   10.288  216.458

acf_resid(sh1)

(smrysh1 <- summary(sh1))

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Pos ~ LangLoc + s(Time, by = LangLoc) + s(Time, by = IsENSHStart) + 
##     s(Time, by = IsENSHEnd) + s(Time, by = IsNLSHStart) + s(Time, 
##     by = IsNLSHEnd) + s(Time, by = IsDESHStart) + s(Time, by = IsDESHEnd) + 
##     s(Time, SpeakerSoundLoc, bs = "fs", m = 1) + s(Time, Word, 
##     bs = "fs", m = 1)
## 
## Parametric coefficients:
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)      0.01990    0.09964   0.200    0.842
## LangLocEN.End   -0.10295    0.14271  -0.721    0.471
## LangLocNL.End   -0.15537    0.14255  -1.090    0.276
## LangLocDE.Start  0.17203    0.14287   1.204    0.229
## LangLocEN.Start  0.16462    0.14636   1.125    0.261
## LangLocNL.Start -0.01689    0.15045  -0.112    0.911
## 
## Approximate significance of smooth terms:
##                              edf   Ref.df       F  p-value    
## s(Time):LangLocDE.End      7.394    7.534  12.784  < 2e-16 ***
## s(Time):LangLocEN.End      7.530    7.669   6.548 1.68e-08 ***
## s(Time):LangLocNL.End      5.816    6.055   1.855 0.096879 .  
## s(Time):LangLocDE.Start    6.069    6.311   3.629 0.000718 ***
## s(Time):LangLocEN.Start    6.526    6.739   6.669 2.74e-07 ***
## s(Time):LangLocNL.Start    5.588    5.846   0.678 0.675705    
## s(Time):IsENSHStart        6.180    6.509   4.055 0.000187 ***
## s(Time):IsENSHEnd          5.075    5.420   2.061 0.090637 .  
## s(Time):IsNLSHStart        3.747    4.062   0.807 0.562415    
## s(Time):IsNLSHEnd          5.620    5.949   1.025 0.417433    
## s(Time):IsDESHStart        8.497    8.666  20.351  < 2e-16 ***
## s(Time):IsDESHEnd          7.074    7.339   7.621 1.81e-05 ***
## s(Time,SpeakerSoundLoc) 2040.295 2481.000  16.271  < 2e-16 ***
## s(Time,Word)             166.613  194.000 139.157  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.452   Deviance explained = 45.7%
## fREML = -2.9479e+05  Scale est. = 2.8639    n = 265599

par(mfrow=c(3,2),mar=c(5.1, 5.1, 4.1, 2.1))
plot(sh1,select=7,shade=T,rug=F, ylim=c(-2.1,0.6), main='SH vs S: English (start)', cex.lab=2.5, cex.axis=2.5, cex.main=2.5, cex.sub=2.5, ylab='Position difference')
abline(h=0)
plot(sh1,select=8,shade=T,rug=F, ylim=c(-2.1,0.6), main='SH vs S: English (end)', cex.lab=2.5, cex.axis=2.5, cex.main=2.5, cex.sub=2.5, ylab='Position difference')
abline(h=0)
plot(sh1,select=11,shade=T,rug=F, ylim=c(-2.1,0.6), main='SH vs S: German (start)', cex.lab=2.5, cex.axis=2.5, cex.main=2.5, cex.sub=2.5, ylab='Position difference')
abline(h=0)
plot(sh1,select=12,shade=T,rug=F, ylim=c(-2.1,0.6), main='SH vs S: German (end)', cex.lab=2.5, cex.axis=2.5, cex.main=2.5, cex.sub=2.5, ylab='Position difference')
abline(h=0)
plot(sh1,select=9,shade=T,rug=F, ylim=c(-2.1,0.6), main='SH vs S: Dutch (start)', cex.lab=2.5, cex.axis=2.5, cex.main=2.5, cex.sub=2.5, ylab='Position difference')
abline(h=0)
plot(sh1,select=10,shade=T,rug=F, ylim=c(-2.1,0.6), main='SH vs S: Dutch (end)', cex.lab=2.5, cex.axis=2.5, cex.main=2.5, cex.sub=2.5, ylab='Position difference')
abline(h=0)

6 Perception dataset

The following shows the columns of the full dataset and their explanation.

if (!file.exists('perc.rda')) { 
    download.file('http://www.let.rug.nl/wieling/ISSP2017/perc.rda', 'perc.rda')
}
load('perc.rda')

6.1 Column names

The dataset perc consists of 6468 rows and 13 columns with the following column names:

colnames(perc)

##  [1] "Speaker"           "Gender"            "BirthYear"        
##  [4] "Lang"              "Trial"             "WordActual"       
##  [7] "WordRecognized"    "WordRecognizedASR" "Sound"            
## [10] "SoundRecog"        "SoundRecogASR"     "Correct"          
## [13] "CorrectASR"

6.2 Data description

Speaker – ID of the speaker
Gender – Gender of the speaker
BirthYear – Year of birth of the speaker (age is calculated by subtracting the year of birth from the recording year: 2014)
Lang – Native language of the speaker ("NL" for Dutch, "DE" for German, or "EN" for English)
Trial – The trial number of the word
WordActual – The label of the word
WordRecognized – The recognized word by a native Dutch speaker with good English proficiency
WordRecognizedASR – The recognized word by the Google Cloud Speech API (at the end of 2016)
Sound – The sound of interest ("TH" for words with the dental fricative, "T" for words with the stop, "St" for words with an s instead of the stop or dental fricative, "SH" for words with a post-alveolar fricative, "S" for words with an alveolar fricative
SoundRecog – The sound recognized by the native Dutch speaker
SoundRecogASR – The sound recognized by the Google Cloud Speech API
Correct – 1 if the Dutch speaker recognized the word which was pronounced by the speaker, 0 if not
CorrectASR – 1 if the Google ASR system recognized the word which was pronounced by the speaker, 0 if not

6.3 Demographics

round(mean(2014-perc[perc$Lang=='EN',]$BirthYear),1)

## [1] 25

table(unique(perc[perc$Lang=='EN',c("Speaker","Gender")])$Gender)

## 
##  F  M 
## 14  8

round(mean(2014-perc[perc$Lang=='NL',]$BirthYear),1)

## [1] 20.7

table(unique(perc[perc$Lang=='NL',c("Speaker","Gender")])$Gender)

## 
##  F  M 
##  8 12

round(mean(2014-perc[perc$Lang=='DE',]$BirthYear),1)

## [1] 23

table(unique(perc[perc$Lang=='DE',c("Speaker","Gender")])$Gender)

## 
##  F  M 
## 16 11

7 Perceptual analysis: human listener

7.1 /θ/ incorrectly recognized as /t/

# TH slechter in NL herkend
percth = droplevels(perc[perc$Sound %in% c('TH'),])
m = glmer(Correct ~ Lang + (1|Speaker), data=percth, family='binomial', control = glmerControl(optimizer = "bobyqa"))
summary(m)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: Correct ~ Lang + (1 | Speaker)
##    Data: percth
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##   1082.4   1102.9   -537.2   1074.4     1241 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5770  0.2608  0.3328  0.4291  1.1997 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  Speaker (Intercept) 0.7665   0.8755  
## Number of obs: 1245, groups:  Speaker, 69
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   2.2043     0.2485   8.871  < 2e-16 ***
## LangDE       -0.1513     0.3327  -0.455 0.649330    
## LangNL       -1.2735     0.3410  -3.734 0.000188 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr) LangDE
## LangDE -0.728       
## LangNL -0.721  0.532

round(prop.table(with(percth[percth$Lang=='NL',],table(Sound,SoundRecog))),2)

##      SoundRecog
## Sound   St    T   TH
##    TH 0.14 0.17 0.70

round(prop.table(with(percth[percth$Lang=='DE',],table(Sound,SoundRecog))),2)

##      SoundRecog
## Sound   St    T   TH
##    TH 0.07 0.07 0.86

round(prop.table(with(percth[percth$Lang=='EN',],table(Sound,SoundRecog))),2)

##      SoundRecog
## Sound   St    T   TH
##    TH 0.09 0.04 0.88

7.2 /s/ incorrectly recognized as /ʃ/

percs = droplevels(perc[perc$Sound %in% c('S'),])
m = glmer(Correct ~ Lang + (1|Speaker), data=percs, family='binomial')
summary(m)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: Correct ~ Lang + (1 | Speaker)
##    Data: percs
## 
##      AIC      BIC   logLik deviance df.resid 
##    880.6    901.6   -436.3    872.6     1436 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.0750  0.1493  0.2343  0.3490  1.0307 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  Speaker (Intercept) 1.053    1.026   
## Number of obs: 1440, groups:  Speaker, 69
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   2.5928     0.2912   8.904   <2e-16 ***
## LangDE        0.8431     0.4187   2.014   0.0440 *  
## LangNL       -0.8972     0.3951  -2.271   0.0232 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr) LangDE
## LangDE -0.630       
## LangNL -0.711  0.473

round(prop.table(with(percs[percs$Lang=='NL',],table(Sound,SoundRecog))),2)

##      SoundRecog
## Sound    S   SH
##     S 0.81 0.19

round(prop.table(with(percs[percs$Lang=='DE',],table(Sound,SoundRecog))),2)

##      SoundRecog
## Sound    S   SH
##     S 0.95 0.05

round(prop.table(with(percs[percs$Lang=='EN',],table(Sound,SoundRecog))),2)

##      SoundRecog
## Sound   S  SH
##     S 0.9 0.1

7.3 /ʃ/ incorrectly recognized as /s/

percsh = droplevels(perc[perc$Sound %in% c('SH'),])
m = glmer(Correct ~ Lang + (1|Speaker), data=percsh, family='binomial')
summary(m)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: Correct ~ Lang + (1 | Speaker)
##    Data: percsh
## 
##      AIC      BIC   logLik deviance df.resid 
##    457.5    478.0   -224.7    449.5     1256 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.4273  0.1450  0.1836  0.2206  0.4708 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  Speaker (Intercept) 0.6346   0.7966  
## Number of obs: 1260, groups:  Speaker, 69
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   3.6008     0.3561  10.112   <2e-16 ***
## LangDE       -0.5327     0.4186  -1.273    0.203    
## LangNL       -0.1490     0.4698  -0.317    0.751    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr) LangDE
## LangDE -0.734       
## LangNL -0.641  0.539

round(prop.table(with(percsh[percsh$Lang=='NL',],table(Sound,SoundRecog))),2)

##      SoundRecog
## Sound    S   SH
##    SH 0.04 0.96

round(prop.table(with(percsh[percsh$Lang=='DE',],table(Sound,SoundRecog))),2)

##      SoundRecog
## Sound    S   SH
##    SH 0.06 0.94

round(prop.table(with(percsh[percsh$Lang=='EN',],table(Sound,SoundRecog))),2)

##      SoundRecog
## Sound    S   SH
##    SH 0.03 0.97

8 Perceptual analysis: ASR

8.1 /θ/ incorrectly recognized as /t/

percth[percth$SoundRecogASR %in% c('S','SH'),]$SoundRecogASR = NA # other word recognized
m = glmer(CorrectASR ~ Lang + (1|Speaker), data=percth, family='binomial')
summary(m)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: CorrectASR ~ Lang + (1 | Speaker)
##    Data: percth
## 
##      AIC      BIC   logLik deviance df.resid 
##   1409.1   1429.6   -700.6   1401.1     1241 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.0977 -0.5992 -0.5002  1.1029  2.8116 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  Speaker (Intercept) 0.2989   0.5467  
## Number of obs: 1245, groups:  Speaker, 69
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.7867     0.1595  -4.933 8.09e-07 ***
## LangDE       -0.1526     0.2181  -0.700 0.484210    
## LangNL       -0.9084     0.2501  -3.632 0.000282 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr) LangDE
## LangDE -0.727       
## LangNL -0.625  0.460

round(prop.table(with(percth[percth$Lang=='NL',],table(Sound,SoundRecogASR))),2)

##      SoundRecogASR
## Sound   St    T   TH
##    TH 0.67 0.14 0.19

round(prop.table(with(percth[percth$Lang=='DE',],table(Sound,SoundRecogASR))),2)

##      SoundRecogASR
## Sound   St    T   TH
##    TH 0.62 0.07 0.31

round(prop.table(with(percth[percth$Lang=='EN',],table(Sound,SoundRecogASR))),2)

##      SoundRecogASR
## Sound   St    T   TH
##    TH 0.61 0.06 0.33

8.2 /s/ incorrectly recognized as /ʃ/

percs = droplevels(perc[perc$Sound %in% c('S'),])
percs[percs$SoundRecogASR %in% c('T','St','TH'),]$SoundRecogASR = NA # other word recognized
m = glmer(CorrectASR ~ Lang + (1|Speaker), data=percs, family='binomial')
summary(m)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: CorrectASR ~ Lang + (1 | Speaker)
##    Data: percs
## 
##      AIC      BIC   logLik deviance df.resid 
##   1041.2   1062.3   -516.6   1033.2     1436 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.2568  0.1821  0.2455  0.3982  1.1722 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  Speaker (Intercept) 0.99     0.995   
## Number of obs: 1440, groups:  Speaker, 69
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  2.64434    0.29105   9.085  < 2e-16 ***
## LangDE       0.02867    0.38225   0.075 0.940218    
## LangNL      -1.34533    0.38586  -3.487 0.000489 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr) LangDE
## LangDE -0.703       
## LangNL -0.735  0.529

round(prop.table(with(percs[percs$Lang=='NL',],table(Sound,SoundRecogASR))),2)

##      SoundRecogASR
## Sound    S   SH
##     S 0.84 0.16

round(prop.table(with(percs[percs$Lang=='DE',],table(Sound,SoundRecogASR))),2)

##      SoundRecogASR
## Sound    S   SH
##     S 0.98 0.02

round(prop.table(with(percs[percs$Lang=='EN',],table(Sound,SoundRecogASR))),2)

##      SoundRecogASR
## Sound    S   SH
##     S 0.93 0.07

8.3 /ʃ/ incorrectly recognized as /s/

percsh = droplevels(perc[perc$Sound %in% c('SH'),])
percsh[percsh$SoundRecogASR %in% c('T','St','TH'),]$SoundRecogASR = NA # other word recognized
m = glmer(CorrectASR ~ Lang + (1|Speaker), data=percsh, family='binomial')
summary(m)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: CorrectASR ~ Lang + (1 | Speaker)
##    Data: percsh
## 
##      AIC      BIC   logLik deviance df.resid 
##   1161.6   1182.1   -576.8   1153.6     1256 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5596  0.2156  0.3561  0.4816  1.6034 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  Speaker (Intercept) 1.243    1.115   
## Number of obs: 1260, groups:  Speaker, 69
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.9637     0.2893   6.788 1.13e-11 ***
## LangDE       -0.5200     0.3792  -1.371    0.170    
## LangNL       -0.2110     0.4108  -0.514    0.608    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr) LangDE
## LangDE -0.748       
## LangNL -0.686  0.520

round(prop.table(with(percsh[percsh$Lang=='NL',],table(Sound,SoundRecogASR))),2)

##      SoundRecogASR
## Sound    S   SH
##    SH 0.09 0.91

round(prop.table(with(percsh[percsh$Lang=='DE',],table(Sound,SoundRecogASR))),2)

##      SoundRecogASR
## Sound    S   SH
##    SH 0.18 0.82

round(prop.table(with(percsh[percsh$Lang=='EN',],table(Sound,SoundRecogASR))),2)

##      SoundRecogASR
## Sound    S   SH
##    SH 0.14 0.86

9 Replication

To replicate the analysis presented above, you can just copy the following lines to the most recent version of R. Please note that you first need to install Pandoc.

download.file('http://www.let.rug.nl/wieling/ISSP2017/analysisISSP.Rmd', 'analysisISSP.Rmd')
if (length(setdiff('rmarkdown', rownames(installed.packages()))) > 0) {
  install.packages('rmarkdown')  
}
library(rmarkdown)
render('analysisISSP.Rmd') # generates html file with results
browseURL(paste('file://', file.path(getwd(),'analysisISSP.html'), sep='')) # shows result

Data, documentation and analysis scripts for Articulatory differences between L1 and L2 speakers of English

Martijn Wieling (University of Groningen, Haskins Laboratories), Pauline Veenstra (University of Groningen), Patti Adank (University College London), and Mark Tiede (Haskins Laboratories)