In this study, we investigate differences between native English speakers and the English pronunciation of Dutch and German speakers. We focus on the articulatory trajectories obtained using electromagnetic articulography and particularly investigate two sound contrasts: /t/-/θ/ and /s/-/ʃ/. Our results show that while German speakers make both sound contrasts adequately, the Dutch speakers do not distinguish them clearly. To further evaluate these results, both a human Dutch listener as well as an automatic speech recognition (ASR) system classified the pronounced words on the basis of the acoustic recording. Both classifications lined up with the articulatory results. For Dutch speakers, /θ/-words (and /s/-words) were more frequently recognized as /t/-words (and /ʃ/-words). However, the intended utterance was still recognized in the majority of cases for the Dutch speakers. The perceptual results therefore do not support a complete merger of the sounds in Dutch.
Journal: Submitted to Proceedings of ISSP 2017Submitted
Preprint: http://www.martijnwieling.nl/files/ISSP-Wieling.pdf
Keywords: Generalized additive modeling; Tutorial; Articulography; Second language acquisition
## Generated on: August 30, 2017 - 23:39:25
The following commands load the necessary functions and libraries and show the version information.
# install packages if not yet installed
packages <- c("mgcv","itsadug","lme4")
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
install.packages(setdiff(packages, rownames(installed.packages())))
}
# load required packages
library(mgcv)
library(itsadug)
library(lme4)
# version information
R.version.string
## [1] "R version 3.4.1 (2017-06-30)"
cat(paste('mgcv version:',packageVersion('mgcv')))
## mgcv version: 1.8.18
cat(paste('itsadug version:',packageVersion('itsadug')))
## itsadug version: 2.2.4
cat(paste('lme4 version:',packageVersion('lme4')))
## lme4 version: 1.1.12
The following shows the columns of the full dataset and their explanation.
if (!file.exists('datth.rda')) {
download.file('http://www.let.rug.nl/wieling/ISSP2017/datth.rda', 'datth.rda')
}
if (!file.exists('datsh.rda')) {
download.file('http://www.let.rug.nl/wieling/ISSP2017/datsh.rda', 'datsh.rda')
}
load('datth.rda')
load('datsh.rda')
The dataset datsh
consists of 265599 rows and 10 columns, whereas the dataset datth
consists of 223954 rows and 10 columns. Both datasets have the following column names:
colnames(datth)
## [1] "Speaker" "Lang" "Sensor" "Axis" "Trial" "Word" "Sound"
## [8] "Loc" "Time" "Pos"
"NL"
for Dutch, "DE"
for German, or "EN"
for English)"X"
, the anterior-posterior position)"TH"
for words with the dental fricative, "T"
for words with the stop ; or "SH"
for words with the post-alveolar fricative, and "S"
for words with the alveolar fricative in dataset datsh
)"START"
when it occurs at the beginning of the word or "END"
when it occurs at the back of the worddatth <- start_event(datth,event=c("Speaker","Trial"))
datth$LangLoc <- interaction(datth$Lang, datth$Loc)
datth$IsENTHStart <- (datth$Lang == "EN" & datth$Sound == "TH" & datth$Loc == "Start")*1
datth$IsNLTHStart <- (datth$Lang == "NL" & datth$Sound == "TH" & datth$Loc == "Start")*1
datth$IsDETHStart <- (datth$Lang == "DE" & datth$Sound == "TH" & datth$Loc == "Start")*1
datth$IsENTHEnd <- (datth$Lang == "EN" & datth$Sound == "TH" & datth$Loc == "End")*1
datth$IsNLTHEnd <- (datth$Lang == "NL" & datth$Sound == "TH" & datth$Loc == "End")*1
datth$IsDETHEnd <- (datth$Lang == "DE" & datth$Sound == "TH" & datth$Loc == "End")*1
datth$SpeakerSoundLoc <- interaction(datth$Speaker, datth$Sound, datth$Loc)
system.time(th1 <- bam(Pos ~ LangLoc + s(Time,by=LangLoc) + s(Time,by=IsENTHStart) + s(Time,by=IsENTHEnd) + s(Time,by=IsNLTHStart) + s(Time,by=IsNLTHEnd) + s(Time,by=IsDETHStart) + s(Time,by=IsDETHEnd) + s(Time,SpeakerSoundLoc,bs="fs",m=1) + s(Time,Word,bs="fs",m=1), data=datth, discrete=TRUE, rho=0.999, nthreads=8, AR.start=datth$start.event))
## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has
## repeated 1-d smooths of same variable.
## user system elapsed
## 872.532 6.548 150.889
acf_resid(th1)