The present study introduces electromagnetic articulography, the measurement of the position of tongue and lips during speech, as a promising method for the study of dialect variation. By using generalized additive modeling to analyze the articulatory trajectories, we are able to reliably detect aggregate group differences, while simultaneously taking into account the individual variation of dozens of speakers. Our results show that two Dutch dialects show clear differences in their articulatory settings, with a more anterior tongue position in the dialect from Ubbergen in the southern half of the Netherlands than in the dialect of Ter Apel in the northern half of the Netherlands. This clearly demonstrates that articulography is able to reveal interesting structural differences between dialects which are not visible when only focusing on the acoustic signal.
Journal: Submitted (September 7, 2015) to Journal of Phonetics
Preprint: http://www.martijnwieling.nl/files/WielingEtAl-art.pdf
All source data: http://www.let.rug.nl/wieling/DiaArt/SourceData/
Keywords: Articulography; Dialectology; Generalized additive modeling; Articulatory setting
## Generated on: September 04, 2015 - 11:59:00
# install packages if not yet installed
packages <- c("mgcv","itsadug","lme4","parallel","MASS","reshape2")
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
install.packages(setdiff(packages, rownames(installed.packages())))
}
# load required packages
library(mgcv)
library(itsadug)
library(lme4)
library(parallel)
library(MASS)
library(reshape2)
# custom plotting functions
if (!file.exists('plotArt2D.R')) {
download.file('http://www.let.rug.nl/wieling/DiaArt/plotArt2D.R','plotArt2D.R')
}
source('plotArt2D.R')
# version information
R.version.string
## [1] "R version 3.2.2 (2015-08-14)"
packageVersion('mgcv')
## [1] '1.8.7'
packageVersion('itsadug')
## [1] '1.0.1'
packageVersion('lme4')
## [1] '1.1.8'
packageVersion('parallel')
## [1] '3.2.2'
packageVersion('MASS')
## [1] '7.3.29'
packageVersion('reshape2')
## [1] '1.4.1'
if (!file.exists('dia.rda')) {
download.file('http://www.let.rug.nl/wieling/DiaArt/dia.rda','dia.rda') # 23 MB
}
if (!file.exists('cvc.rda')) {
download.file('http://www.let.rug.nl/wieling/DiaArt/cvc.rda','cvc.rda') # 7 MB
}
load('dia.rda')
load('cvc.rda')
dim(dia)
## [1] 1541378 33
dim(cvc)
## [1] 515459 33
str(dia)
## 'data.frame': 1541378 obs. of 33 variables:
## $ Speaker : Factor w/ 40 levels "terapel_1","terapel_2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Group : Factor w/ 2 levels "TerApel","Ubbergen": 1 1 1 1 1 1 1 1 1 1 ...
## $ IsTerApel : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Gender : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
## $ YearBirth : int 1939 1939 1939 1939 1939 1939 1939 1939 1939 1939 ...
## $ PlaceBirth : Factor w/ 18 levels "Arnhem","Bemmel",..: 10 10 10 10 10 10 10 10 10 10 ...
## $ Word : Factor w/ 70 levels "bal","ballen",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WordNr : num 12 12 12 12 12 12 12 12 12 12 ...
## $ Segment : Factor w/ 57 levels "?","@","2","5",..: 14 14 14 14 14 14 14 14 14 14 ...
## $ SegmentNr : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Sensor : Factor w/ 3 levels "T3","T2","T1": 1 1 1 1 1 1 1 1 1 1 ...
## $ Axis : Factor w/ 2 levels "P","H": 1 1 1 1 1 1 1 1 1 1 ...
## $ SensorAxis : Factor w/ 6 levels "T3.P","T2.P",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ GroupSensorAxis : Factor w/ 12 levels "TerApel.T3.P",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ SpeakerSensorAxis: Factor w/ 240 levels "terapel_1.T3.P",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WordSensorAxis : Factor w/ 420 levels "bal.T3.P","ballen.T3.P",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ IsTA.T1.P : num 0 0 0 0 0 0 0 0 0 0 ...
## $ IsTA.T1.H : num 0 0 0 0 0 0 0 0 0 0 ...
## $ IsTA.T2.P : num 0 0 0 0 0 0 0 0 0 0 ...
## $ IsTA.T2.H : num 0 0 0 0 0 0 0 0 0 0 ...
## $ IsTA.T3.P : num 1 1 1 1 1 1 1 1 1 1 ...
## $ IsTA.T3.H : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Word.start : logi TRUE FALSE FALSE FALSE FALSE FALSE ...
## $ Segment.start : logi TRUE FALSE FALSE FALSE FALSE FALSE ...
## $ RecBlock : int 6 6 6 6 6 6 6 6 6 6 ...
## $ TimeInRecBlock : num 22.1 22.1 22.1 22.1 22.1 ...
## $ Time.normWord : num 0.00251 0.01843 0.03434 0.05025 0.06617 ...
## $ Time.normSegment : num 0.0132 0.0968 0.1804 0.2639 0.3475 ...
## $ Position.norm : num 0.782 0.773 0.777 0.793 0.793 ...
## $ F1 : num NA NA NA NA NA NA NA NA NA NA ...
## $ F2 : num NA NA NA NA NA NA NA NA NA NA ...
## $ F1.norm : num NA NA NA NA NA NA NA NA NA NA ...
## $ F2.norm : num NA NA NA NA NA NA NA NA NA NA ...
str(cvc)
## 'data.frame': 515459 obs. of 33 variables:
## $ Speaker : Factor w/ 40 levels "terapel_1","terapel_2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Group : Factor w/ 2 levels "TerApel","Ubbergen": 1 1 1 1 1 1 1 1 1 1 ...
## $ IsTerApel : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Gender : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
## $ YearBirth : int 1939 1939 1939 1939 1939 1939 1939 1939 1939 1939 ...
## $ PlaceBirth : Factor w/ 18 levels "Arnhem","Bemmel",..: 10 10 10 10 10 10 10 10 10 10 ...
## $ Word : Factor w/ 27 levels "kaak","kaap",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WordNr : num 151 151 151 151 151 151 151 151 151 151 ...
## $ Segment : Factor w/ 18 levels "a","b","C","d",..: 9 9 9 9 1 1 1 1 1 1 ...
## $ SegmentNr : num 1 1 1 1 2 2 2 2 2 2 ...
## $ Sensor : Factor w/ 3 levels "T3","T2","T1": 1 1 1 1 1 1 1 1 1 1 ...
## $ Axis : Factor w/ 2 levels "P","H": 1 1 1 1 1 1 1 1 1 1 ...
## $ SensorAxis : Factor w/ 6 levels "T3.P","T2.P",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ GroupSensorAxis : Factor w/ 12 levels "TerApel.T3.P",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ SpeakerSensorAxis: Factor w/ 240 levels "terapel_1.T3.P",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WordSensorAxis : Factor w/ 162 levels "kaak.T3.P","kaap.T3.P",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ IsTA.T1.P : num 0 0 0 0 0 0 0 0 0 0 ...
## $ IsTA.T1.H : num 0 0 0 0 0 0 0 0 0 0 ...
## $ IsTA.T2.P : num 0 0 0 0 0 0 0 0 0 0 ...
## $ IsTA.T2.H : num 0 0 0 0 0 0 0 0 0 0 ...
## $ IsTA.T3.P : num 1 1 1 1 1 1 1 1 1 1 ...
## $ IsTA.T3.H : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Word.start : logi TRUE FALSE FALSE FALSE FALSE FALSE ...
## $ Segment.start : logi TRUE FALSE FALSE FALSE TRUE FALSE ...
## $ RecBlock : int 13 13 13 13 13 13 13 13 13 13 ...
## $ TimeInRecBlock : num 12.3 12.3 12.3 12.4 12.4 ...
## $ Time.normWord : num 0.000892 0.027183 0.052954 0.078985 0.105016 ...
## $ Time.normSegment : num 0.00904 0.27528 0.53625 0.79986 0.01078 ...
## $ Position.norm : num 0.567 0.571 0.599 0.612 0.635 ...
## $ F1 : num NA NA NA NA 417 ...
## $ F2 : num NA NA NA NA 1602 ...
## $ F1.norm : num NA NA NA NA -0.183 ...
## $ F2.norm : num NA NA NA NA 0.421 0.359 0.667 0.306 0.488 NA ...
colnames(dia)
## [1] "Speaker" "Group" "IsTerApel"
## [4] "Gender" "YearBirth" "PlaceBirth"
## [7] "Word" "WordNr" "Segment"
## [10] "SegmentNr" "Sensor" "Axis"
## [13] "SensorAxis" "GroupSensorAxis" "SpeakerSensorAxis"
## [16] "WordSensorAxis" "IsTA.T1.P" "IsTA.T1.H"
## [19] "IsTA.T2.P" "IsTA.T2.H" "IsTA.T3.P"
## [22] "IsTA.T3.H" "Word.start" "Segment.start"
## [25] "RecBlock" "TimeInRecBlock" "Time.normWord"
## [28] "Time.normSegment" "Position.norm" "F1"
## [31] "F2" "F1.norm" "F2.norm"
colnames(cvc) # equal column names as dia
## [1] "Speaker" "Group" "IsTerApel"
## [4] "Gender" "YearBirth" "PlaceBirth"
## [7] "Word" "WordNr" "Segment"
## [10] "SegmentNr" "Sensor" "Axis"
## [13] "SensorAxis" "GroupSensorAxis" "SpeakerSensorAxis"
## [16] "WordSensorAxis" "IsTA.T1.P" "IsTA.T1.H"
## [19] "IsTA.T2.P" "IsTA.T2.H" "IsTA.T3.P"
## [22] "IsTA.T3.H" "Word.start" "Segment.start"
## [25] "RecBlock" "TimeInRecBlock" "Time.normWord"
## [28] "Time.normSegment" "Position.norm" "F1"
## [31] "F2" "F1.norm" "F2.norm"
In the following four models are fitted, two for dialect words (taarten and boor) and two for CVC sequences (taat: [tat] and poop [pop]). For each word two models are fitted, the first to determine the rho value (for correcting autocorrelation in the residuals), the second model which does corrects for the autocorrelation using the predetermined rho value.
# this code is not evaluated, as each model takes about 1 hour to fit
words = c('taarten','boor','taat','poop')
for (word in words) {
if (word %in% c('taarten','boor')) {
subdat = droplevels(dia[dia$Word == word,])
} else {
subdat = droplevels(cvc[cvc$Word == word,])
}
# fit first model to determine autocorrelation in residuals
model0 <- bam(Position.norm ~ s(Time.normWord,by=GroupSensorAxis,k=20) +
GroupSensorAxis + s(Time.normWord,SpeakerSensorAxis,bs='fs',m=1),
data=subdat, method='fREML', gc.level=2)
# assess autocorrelation in residuals
model0acf = acf(resid(model0))
rhoval = as.vector(model0acf[1]$acf)
# fit model which corrects for autocorrelation in residuals
model <- bam(Position.norm ~ s(Time.normWord,by=GroupSensorAxis, k=20) +
GroupSensorAxis + s(Time.normWord,SpeakerSensorAxis,bs='fs',m=1),
data=subdat, method='fREML', gc.level=2, AR.start=Word.start, rho=rhoval)
smry <- summary(model)
save(model,smry,file=paste(word,'-group.rda',sep=''))
}
This visualization corresponds to Figure 5 in the manuscript.
if (!file.exists('taarten-group.rda')) {
download.file('http://www.let.rug.nl/wieling/DiaArt/taarten-group.rda',
'taarten-group.rda') # 253 MB
}
if (!file.exists('boor-group.rda')) {
download.file('http://www.let.rug.nl/wieling/DiaArt/boor-group.rda',
'boor-group.rda') # 251 MB
}
if (!file.exists('taat-group.rda')) {
download.file('http://www.let.rug.nl/wieling/DiaArt/taat-group.rda',
'taat-group.rda') # 254 MB
}
if (!file.exists('poop-group.rda')) {
download.file('http://www.let.rug.nl/wieling/DiaArt/poop-group.rda',
'poop-group.rda') # 255 MB
}
par(mfrow=c(2,2))
load('taarten-group.rda')
taartenmodel = model # store for later use
plotArt2D(model, catvar='GroupSensorAxis',
catlevels.x=c('TerApel.T3.P','Ubbergen.T3.P','TerApel.T2.P',
'Ubbergen.T2.P','TerApel.T1.P','Ubbergen.T1.P'),
catlevels.y=c('TerApel.T3.H','Ubbergen.T3.H','TerApel.T2.H',
'Ubbergen.T2.H','TerApel.T1.H','Ubbergen.T1.H'),
timevar='Time.normWord', collabels=c('Ter Apel', 'Ubbergen'),
main='Position of the three tongue sensors: "taarten"',
xlab='Posterior position', ylab='Height', showPoints=T,
cols=list(c("cadetblue3","cadetblue1"),c("tomato4","tomato")),alpha=15)
load('boor-group.rda')
plotArt2D(model, catvar='GroupSensorAxis',
catlevels.x=c('TerApel.T3.P','Ubbergen.T3.P','TerApel.T2.P',
'Ubbergen.T2.P','TerApel.T1.P','Ubbergen.T1.P'),
catlevels.y=c('TerApel.T3.H','Ubbergen.T3.H','TerApel.T2.H',
'Ubbergen.T2.H','TerApel.T1.H','Ubbergen.T1.H'),
timevar='Time.normWord', collabels=c('Ter Apel', 'Ubbergen'),
main='Position of the three tongue sensors: "boor"',
xlab='Posterior position', ylab='Height', showPoints=T,
cols=list(c("cadetblue3","cadetblue1"),c("tomato4","tomato")),alpha=15)
load('taat-group.rda')
plotArt2D(model, catvar='GroupSensorAxis',
catlevels.x=c('TerApel.T3.P','Ubbergen.T3.P','TerApel.T2.P',
'Ubbergen.T2.P','TerApel.T1.P','Ubbergen.T1.P'),
catlevels.y=c('TerApel.T3.H','Ubbergen.T3.H','TerApel.T2.H',
'Ubbergen.T2.H','TerApel.T1.H','Ubbergen.T1.H'),
timevar='Time.normWord', collabels=c('Ter Apel', 'Ubbergen'),
main='Position of the three tongue sensors: "taat"',
xlab='Posterior position', ylab='Height', showPoints=T,
cols=list(c("cadetblue3","cadetblue1"),c("tomato4","tomato")),alpha=15)
load('poop-group.rda')
plotArt2D(model, catvar='GroupSensorAxis',
catlevels.x=c('TerApel.T3.P','Ubbergen.T3.P','TerApel.T2.P',
'Ubbergen.T2.P','TerApel.T1.P','Ubbergen.T1.P'),
catlevels.y=c('TerApel.T3.H','Ubbergen.T3.H','TerApel.T2.H',
'Ubbergen.T2.H','TerApel.T1.H','Ubbergen.T1.H'),
timevar='Time.normWord', collabels=c('Ter Apel', 'Ubbergen'),
main='Position of the three tongue sensors: "poop"',
xlab='Posterior position', ylab='Height', showPoints=T,
cols=list(c("cadetblue3","cadetblue1"),c("tomato4","tomato")),alpha=15)
The following visualization corresponds to Figure 6 in the manuscript.
par(mfrow=c(2,2))
plotSmooths(taartenmodel,xvar='Time.normWord',catvar='GroupSensorAxis',
catlevels=c('TerApel.T1.P','Ubbergen.T1.P'),
dropRanef='SpeakerSensorAxis',ylim=c(0.05,1),legendPos='topleft',
main='T1 posterior position: "taarten"',xlab='Time (normalized)',
ylab='Posterior position',legendlabels=c('Ter Apel','Ubbergen'),
legendtitle='Group', colors=c('cadetblue3','tomato4'),cexPoints=1,
alphaPoints=0.04,showPoints=T)
plotSmooths(taartenmodel,xvar='Time.normWord',catvar='GroupSensorAxis',
catlevels=c('TerApel.T1.H','Ubbergen.T1.H'),
dropRanef='SpeakerSensorAxis',ylim=c(0.05,1),legendPos='topleft',
main='T1 height: "taarten"',xlab='Time (normalized)',
ylab='Height',legendlabels=c('Ter Apel','Ubbergen'),
legendtitle='Group',colors=c('cadetblue3','tomato4'),cexPoints=1,
alphaPoints=0.04,showPoints=T)
plotDiff(taartenmodel, xvar="Time.normWord", catvar='GroupSensorAxis',
catlevels=c('Ubbergen.T1.P','TerApel.T1.P'),
dropRanef='SpeakerSensorAxis',ylim=c(-0.325,0.325),
main='T1 posterior position: Ter Apel vs. Ubbergen',
ylab='Posterior position difference',xlab='Time (normalized)')
plotDiff(taartenmodel, xvar="Time.normWord", catvar='GroupSensorAxis',
catlevels=c('Ubbergen.T1.H','TerApel.T1.H'),
dropRanef='SpeakerSensorAxis',ylim=c(-0.325,0.325),
main='T1 height: Ter Apel vs. Ubbergen',
ylab='Height difference',xlab='Time (normalized)')
To formally assess the difference, a model is specified with difference curves (see explanation in the manuscript).
# this code is not evaluated, as each model takes about 1 to 2 hours to fit
subdat = droplevels(dia[dia$Word == 'taarten',])
# fit first model to determine autocorrelation in residuals
model0 <- bam(Position.norm ~ s(Time.normWord, by=SensorAxis,k=20) + SensorAxis +
s(Time.normWord, by=IsTA.T1.P,k=20) + s(Time.normWord, by=IsTA.T1.H,k=20) +
s(Time.normWord, by=IsTA.T2.P,k=20) + s(Time.normWord, by=IsTA.T2.H,k=20) +
s(Time.normWord, by=IsTA.T3.P,k=20) + s(Time.normWord, by=IsTA.T3.H,k=20) +
s(Time.normWord,SpeakerSensorAxis,bs='fs',m=1), data=subdat,
method='fREML', gc.level=2)
# assess autocorrelation in residuals
model0acf = acf(resid(model0))
rhoval = as.vector(model0acf[1]$acf)
# fit model which corrects for autocorrelation in residuals
modelbin <- bam(Position.norm ~ s(Time.normWord, by=SensorAxis,k=20) + SensorAxis +
s(Time.normWord, by=IsTA.T1.P,k=20) + s(Time.normWord, by=IsTA.T1.H,k=20) +
s(Time.normWord, by=IsTA.T2.P,k=20) + s(Time.normWord, by=IsTA.T2.H,k=20) +
s(Time.normWord, by=IsTA.T3.P,k=20) + s(Time.normWord, by=IsTA.T3.H,k=20) +
s(Time.normWord,SpeakerSensorAxis,bs='fs',m=1), data=subdat, method='fREML',
gc.level=2, AR.start=Word.start, rho=rhoval)
smrybin <- summary(modelbin)
save(modelbin,smrybin,file='taarten-binary.rda')
if (!file.exists('taarten-binary.rda')) {
download.file('http://www.let.rug.nl/wieling/DiaArt/taarten-binary.rda',
'taarten-binary.rda') # 261 MB
}
load('taarten-binary.rda')
smrybin # show summary
##
## Family: gaussian
## Link function: identity
##
## Formula:
## Position.norm ~ s(Time.normWord, by = SensorAxis, k = 20) + SensorAxis +
## s(Time.normWord, by = IsTA.T1.P, k = 20) + s(Time.normWord,
## by = IsTA.T1.H, k = 20) + s(Time.normWord, by = IsTA.T2.P,
## k = 20) + s(Time.normWord, by = IsTA.T2.H, k = 20) + s(Time.normWord,
## by = IsTA.T3.P, k = 20) + s(Time.normWord, by = IsTA.T3.H,
## k = 20) + s(Time.normWord, SpeakerSensorAxis, bs = "fs",
## m = 1)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.68071 0.01779 38.255 <2e-16 ***
## SensorAxisT2.P -0.24752 0.02518 -9.832 <2e-16 ***
## SensorAxisT1.P -0.48324 0.02519 -19.181 <2e-16 ***
## SensorAxisT3.H 0.01578 0.02512 0.628 0.5300
## SensorAxisT2.H -0.06143 0.02519 -2.439 0.0147 *
## SensorAxisT1.H -0.21961 0.02521 -8.710 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(Time.normWord):SensorAxisT3.P 15.21 17.00 11.191 < 2e-16 ***
## s(Time.normWord):SensorAxisT2.P 15.62 17.29 13.492 < 2e-16 ***
## s(Time.normWord):SensorAxisT1.P 16.48 17.85 17.593 < 2e-16 ***
## s(Time.normWord):SensorAxisT3.H 12.44 14.73 6.274 3.64e-13 ***
## s(Time.normWord):SensorAxisT2.H 15.71 17.38 18.599 < 2e-16 ***
## s(Time.normWord):SensorAxisT1.H 17.37 18.39 32.779 < 2e-16 ***
## s(Time.normWord):IsTA.T1.P 14.94 17.18 7.597 < 2e-16 ***
## s(Time.normWord):IsTA.T1.H 16.86 18.63 14.829 < 2e-16 ***
## s(Time.normWord):IsTA.T2.P 13.92 16.26 7.203 < 2e-16 ***
## s(Time.normWord):IsTA.T2.H 15.38 17.48 9.385 < 2e-16 ***
## s(Time.normWord):IsTA.T3.P 12.93 15.32 5.442 2.72e-11 ***
## s(Time.normWord):IsTA.T3.H 11.70 14.00 3.641 4.28e-06 ***
## s(Time.normWord,SpeakerSensorAxis) 1670.45 2148.00 6.985 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.964 Deviance explained = 96.7%
## fREML = -56955 Scale est. = 0.0011372 n = 21750
The summary above shows that all difference smooths are significant (i.e. the p-values associated with the binary, IsTA, smooths.
In the following the aggregate model for dialect words is fitted. Three models are fitted: the first model is used to determine the rho value (for correcting autocorrelation in the residuals), the second model fits separate trajectories for each level (this model is used for visualization), and the third model fits the model using difference curves, thereby yielding p-values assessing if the difference between the two groups is significant or not.
# this code is not evaluated, as each model takes about 5 hours to fit (using 36 CPUs)
cl = makeCluster(36) # 36 CPUs to calculate the model fit
# fit first model to determine autocorrelation in residuals
modelNoRho <- bam(Position.norm ~ s(Time.normWord,by=GroupSensorAxis, k=20) +
GroupSensorAxis + s(Time.normWord,SpeakerSensorAxis,bs='fs',m=1) +
s(Time.normWord,WordSensorAxis,bs='fs',m=1), data=dia,
method='fREML', cluster=cl)
save(modelNoRho,file='modelD-group-norho-1.8.7.rda')
# assess autocorrelation in residuals
modelACF = acf(resid(modelNoRho))
(rhoval = as.vector(modelACF[1]$acf)) # 0.9669873
# fit model which corrects for autocorrelation in residuals
model <- bam(Position.norm ~ s(Time.normWord,by=GroupSensorAxis, k=20) +
GroupSensorAxis + s(Time.normWord,SpeakerSensorAxis,bs='fs',m=1) +
s(Time.normWord,WordSensorAxis,bs='fs',m=1), data=dia,
method='fREML', AR.start=Word.start, rho=rhoval, cluster=cl)
smry <- summary(model)
save(model,smry,file='modelD-group-1.8.7.rda') # mgcv 1.8.7 was used
# fit model with difference curve
modelbin <- bam(Position.norm ~ s(Time.normWord, by=SensorAxis,k=20) + SensorAxis +
s(Time.normWord, by=IsTA.T1.P,k=20) + s(Time.normWord, by=IsTA.T1.H,k=20) +
s(Time.normWord, by=IsTA.T2.P,k=20) + s(Time.normWord, by=IsTA.T2.H,k=20) +
s(Time.normWord, by=IsTA.T3.P,k=20) + s(Time.normWord, by=IsTA.T3.H,k=20) +
s(Time.normWord,SpeakerSensorAxis,bs='fs',m=1) +
s(Time.normWord,WordSensorAxis,bs='fs',m=1), data=dia, method='fREML',
AR.start=Word.start, rho=rhoval, cluster=cl)
smrybin <- summary(modelbin)
save(modelbin,smrybin,file='modelD-binary-1.8.7.rda')
The following graph visualizes the individual variation and is equal to Figure 3 in the manuscript.
if (!file.exists('modelD-group-1.8.7.rda')) {
download.file('http://www.let.rug.nl/wieling/DiaArt/modelD-group-1.8.7.rda',
'modelD-group-1.8.7.rda') # 1773 MB
}
load('modelD-group-1.8.7.rda')
plot(model,select=13,ylab='Position (normalized)', xlab='Time (normalized)')
The following graph visualizes the autocorrelation and is equal to Figure 4 in the manuscript.
if (!file.exists('modelD-group-norho-1.8.7.rda')) {
download.file('http://www.let.rug.nl/wieling/DiaArt/modelD-group-norho-1.8.7.rda',
'modelD-group-norho-1.8.7.rda') # 1147 MB
}
load('modelD-group-norho-1.8.7.rda')
par(mfrow=c(1,2))
acf_resid(modelNoRho, main='Original autocorrelation in residuals',ylab='ACF',max_lag=30)
acf_resid(model, main='Corrected autocorrelation in residuals',ylab='ACF',max_lag=30)
The following graph corresponds to Figure 7 in the manuscript and visualizes the tongue trajectories in two dimensions.
plotArt2D(model, catvar='GroupSensorAxis',
catlevels.x=c('TerApel.T1.P','Ubbergen.T1.P','TerApel.T2.P',
'Ubbergen.T2.P','TerApel.T3.P','Ubbergen.T3.P'),
catlevels.y=c('TerApel.T1.H','Ubbergen.T1.H','TerApel.T2.H',
'Ubbergen.T2.H','TerApel.T3.H','Ubbergen.T3.H'),
timevar='Time.normWord', collabels=c('Ter Apel', 'Ubbergen'),
xlab='Posterior position', ylab='Height', showPoints=T,
main='Position of the three tongue sensors: dialect words',
cols=list(c("cadetblue3","cadetblue1"),c("tomato4","tomato")),
alpha=2, cexPoints=0.4)
The following graph corresponds to Figure 8 in the manuscript and visualizes the tongue trajectory differences.
par(mfrow=c(4,3))
plotSmooths(model,xvar='Time.normWord',catvar='GroupSensorAxis',
catlevels=c('TerApel.T1.P','Ubbergen.T1.P'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(0.05,1),legendPos='topleft',main='T1 posterior position',
xlab='Time (normalized)',ylab='Posterior position',
legendlabels=c('Ter Apel','Ubbergen'),legendtitle='Group',
colors=c('cadetblue3','tomato4'),cexPoints=0.5,showPoints=F)
plotSmooths(model,xvar='Time.normWord',catvar='GroupSensorAxis',
catlevels=c('TerApel.T2.P','Ubbergen.T2.P'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(0.05,1),legendPos='topleft',main='T2 posterior position',
xlab='Time (normalized)',ylab='Posterior position',
legendlabels=c('Ter Apel','Ubbergen'),legendtitle='Group',
colors=c('cadetblue3','tomato4'),cexPoints=0.5,showPoints=F)
plotSmooths(model,xvar='Time.normWord',catvar='GroupSensorAxis',
catlevels=c('TerApel.T3.P','Ubbergen.T3.P'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(0.05,1),legendPos='bottomleft',main='T3 posterior position',
xlab='Time (normalized)',ylab='Posterior position',
legendlabels=c('Ter Apel','Ubbergen'),legendtitle='Group',
colors=c('cadetblue3','tomato4'),cexPoints=0.5,showPoints=F)
plotDiff(model, xvar="Time.normWord", catvar='GroupSensorAxis',
catlevels=c('Ubbergen.T1.P','TerApel.T1.P'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(-0.13,0.13),main='T1 posterior position Ter Apel vs. Ubbergen)',
ylab='Posterior position difference',xlab='Time (normalized)')
plotDiff(model, xvar="Time.normWord", catvar='GroupSensorAxis',
catlevels=c('Ubbergen.T2.P','TerApel.T2.P'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(-0.13,0.13),main='T2 posterior position Ter Apel vs. Ubbergen)',
ylab='Posterior position difference',xlab='Time (normalized)')
plotDiff(model, xvar="Time.normWord", catvar='GroupSensorAxis',
catlevels=c('Ubbergen.T3.P','TerApel.T3.P'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(-0.13,0.13),main='T3 posterior position Ter Apel vs. Ubbergen)',
ylab='Posterior position difference',xlab='Time (normalized)')
plotSmooths(model,xvar='Time.normWord',catvar='GroupSensorAxis',
catlevels=c('TerApel.T1.H','Ubbergen.T1.H'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(0.05,1),legendPos='topleft',main='T1 height',
xlab='Time (normalized)',ylab='Height',
legendlabels=c('Ter Apel','Ubbergen'),legendtitle='Group',
colors=c('cadetblue3','tomato4'),cexPoints=0.5,showPoints=F)
plotSmooths(model,xvar='Time.normWord',catvar='GroupSensorAxis',
catlevels=c('TerApel.T2.H','Ubbergen.T2.H'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(0.05,1),legendPos='topleft',main='T2 height',
xlab='Time (normalized)',ylab='Height',
legendlabels=c('Ter Apel','Ubbergen'),legendtitle='Group',
colors=c('cadetblue3','tomato4'),cexPoints=0.5,showPoints=F)
plotSmooths(model,xvar='Time.normWord',catvar='GroupSensorAxis',
catlevels=c('TerApel.T3.H','Ubbergen.T3.H'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(0.05,1),legendPos='bottomleft',main='T3 height',
xlab='Time (normalized)',ylab='Height',
legendlabels=c('Ter Apel','Ubbergen'),legendtitle='Group',
colors=c('cadetblue3','tomato4'),cexPoints=0.5,showPoints=F)
plotDiff(model, xvar="Time.normWord", catvar='GroupSensorAxis',
catlevels=c('Ubbergen.T1.H','TerApel.T1.H'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(-0.13,0.13),main='T1 height Ter Apel vs. Ubbergen)',
ylab='Height difference',xlab='Time (normalized)')
plotDiff(model, xvar="Time.normWord", catvar='GroupSensorAxis',
catlevels=c('Ubbergen.T2.H','TerApel.T2.H'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(-0.13,0.13),main='T2 height Ter Apel vs. Ubbergen)',
ylab='Height difference',xlab='Time (normalized)')
plotDiff(model, xvar="Time.normWord", catvar='GroupSensorAxis',
catlevels=c('Ubbergen.T3.H','TerApel.T3.H'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(-0.13,0.13),main='T3 height Ter Apel vs. Ubbergen)',
ylab='Height difference',xlab='Time (normalized)')
if (!file.exists('modelD-binary-1.8.7.rda')) {
download.file('http://www.let.rug.nl/wieling/DiaArt/modelD-binary-1.8.7.rda',
'modelD-binary-1.8.7.rda') # 1773 MB
}
load('modelD-binary-1.8.7.rda')
smrybin # show summary assessing significant differences
##
## Family: gaussian
## Link function: identity
##
## Formula:
## Position.norm ~ s(Time.normWord, by = SensorAxis, k = 20) + SensorAxis +
## s(Time.normWord, by = IsTA.T1.P, k = 20) + s(Time.normWord,
## by = IsTA.T1.H, k = 20) + s(Time.normWord, by = IsTA.T2.P,
## k = 20) + s(Time.normWord, by = IsTA.T2.H, k = 20) + s(Time.normWord,
## by = IsTA.T3.P, k = 20) + s(Time.normWord, by = IsTA.T3.H,
## k = 20) + s(Time.normWord, SpeakerSensorAxis, bs = "fs",
## m = 1) + s(Time.normWord, WordSensorAxis, bs = "fs", m = 1)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.695607 0.014362 48.435 < 2e-16 ***
## SensorAxisT2.P -0.241335 0.020376 -11.844 < 2e-16 ***
## SensorAxisT1.P -0.469898 0.020435 -22.995 < 2e-16 ***
## SensorAxisT3.H 0.002574 0.020409 0.126 0.9
## SensorAxisT2.H -0.121326 0.020463 -5.929 3.05e-09 ***
## SensorAxisT1.H -0.277807 0.020535 -13.529 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(Time.normWord):SensorAxisT3.P 4.101 5.710 1.694 0.13026
## s(Time.normWord):SensorAxisT2.P 7.943 10.154 4.286 4.47e-06 ***
## s(Time.normWord):SensorAxisT1.P 10.919 12.744 6.432 3.32e-12 ***
## s(Time.normWord):SensorAxisT3.H 8.855 10.840 7.680 2.88e-13 ***
## s(Time.normWord):SensorAxisT2.H 11.455 13.199 14.083 < 2e-16 ***
## s(Time.normWord):SensorAxisT1.H 16.164 16.914 14.512 < 2e-16 ***
## s(Time.normWord):IsTA.T1.P 5.657 7.013 2.005 0.05051 .
## s(Time.normWord):IsTA.T1.H 17.133 18.404 9.898 < 2e-16 ***
## s(Time.normWord):IsTA.T2.P 4.072 4.928 2.937 0.01283 *
## s(Time.normWord):IsTA.T2.H 13.831 15.783 3.514 2.72e-06 ***
## s(Time.normWord):IsTA.T3.P 2.018 2.028 1.513 0.21935
## s(Time.normWord):IsTA.T3.H 10.102 12.219 2.466 0.00273 **
## s(Time.normWord,SpeakerSensorAxis) 1949.512 2148.000 39.358 < 2e-16 ***
## s(Time.normWord,WordSensorAxis) 3722.378 3774.000 121.938 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.861 Deviance explained = 86.2%
## fREML = -4.3865e+06 Scale est. = 0.0028218 n = 1541344
In the following the aggregate model for CVC sequences is fitted. Three models are fitted: the first model is used to determine the rho value (for correcting autocorrelation in the residuals), the second model fits separate trajectories for each level (this model is used for visualization), and the third model fits the model using difference curves, thereby yielding p-values assessing if the difference between the two groups is significant or not.
# this code is not evaluated, as each model takes about 5 hours to fit (using 36 CPUs)
cl = makeCluster(36) # 36 CPUs to calculate the model fit
# fit first model to determine autocorrelation in residuals
modelNoRho <- bam(Position.norm ~ s(Time.normWord,by=GroupSensorAxis, k=20) +
GroupSensorAxis + s(Time.normWord,SpeakerSensorAxis,bs='fs',m=1) +
s(Time.normWord,WordSensorAxis,bs='fs',m=1), data=cvc,
method='fREML', cluster=cl)
save(modelNoRho,file='modelCVC-group-norho-1.8.7.rda')
# assess autocorrelation in residuals
modelACF = acf(resid(modelNoRho))
(rhoval = as.vector(modelACF[1]$acf)) # 0.9572511
# fit model which corrects for autocorrelation in residuals
model <- bam(Position.norm ~ s(Time.normWord,by=GroupSensorAxis, k=20) +
GroupSensorAxis + s(Time.normWord,SpeakerSensorAxis,bs='fs',m=1) +
s(Time.normWord,WordSensorAxis,bs='fs',m=1), data=cvc,
method='fREML', AR.start=Word.start, rho=rhoval, cluster=cl)
smry <- summary(model)
save(model,smry,file='modelCVC-group-1.8.7.rda') # mgcv 1.8.7 was used
# fit model with difference curve
modelbin <- bam(Position.norm ~ s(Time.normWord, by=SensorAxis,k=20) + SensorAxis +
s(Time.normWord, by=IsTA.T1.P,k=20) + s(Time.normWord, by=IsTA.T1.H,k=20) +
s(Time.normWord, by=IsTA.T2.P,k=20) + s(Time.normWord, by=IsTA.T2.H,k=20) +
s(Time.normWord, by=IsTA.T3.P,k=20) + s(Time.normWord, by=IsTA.T3.H,k=20) +
s(Time.normWord,SpeakerSensorAxis,bs='fs',m=1) +
s(Time.normWord,WordSensorAxis,bs='fs',m=1), data=cvc, method='fREML',
AR.start=Word.start, rho=rhoval, cluster=cl)
smrybin <- summary(modelbin)
save(modelbin,smrybin,file='modelCVC-binary-1.8.7.rda')
The following graph visualizes the individual variation.
if (!file.exists('modelCVC-group-1.8.7.rda')) {
download.file('http://www.let.rug.nl/wieling/DiaArt/modelCVC-group-1.8.7.rda',
'modelCVC-group-1.8.7.rda') # 690 MB
}
load('modelCVC-group-1.8.7.rda')
plot(model,select=13,ylab='Position (normalized)', xlab='Time (normalized)')
The following graph visualizes the autocorrelation.
if (!file.exists('modelCVC-group-norho-1.8.7.rda')) {
download.file('http://www.let.rug.nl/wieling/DiaArt/modelCVC-group-norho-1.8.7.rda',
'modelCVC-group-norho-1.8.7.rda') # 448 MB
}
load('modelCVC-group-norho-1.8.7.rda')
par(mfrow=c(1,2))
acf_resid(modelNoRho, main='Original autocorrelation in residuals',ylab='ACF',max_lag=30)
acf_resid(model, main='Corrected autocorrelation in residuals',ylab='ACF',max_lag=30)
The following graph corresponds to Figure 9 in the manuscript and visualizes the tongue trajectories in two dimensions.
plotArt2D(model, catvar='GroupSensorAxis',
catlevels.x=c('TerApel.T1.P','Ubbergen.T1.P','TerApel.T2.P',
'Ubbergen.T2.P','TerApel.T3.P','Ubbergen.T3.P'),
catlevels.y=c('TerApel.T1.H','Ubbergen.T1.H','TerApel.T2.H',
'Ubbergen.T2.H','TerApel.T3.H','Ubbergen.T3.H'),
timevar='Time.normWord', collabels=c('Ter Apel', 'Ubbergen'),
xlab='Posterior position', ylab='Height', showPoints=T,
main='Position of the three tongue sensors: Dutch CVC sequences',
cols=list(c("cadetblue3","cadetblue1"),c("tomato4","tomato")),
alpha=3,cexPoints=0.45)
The following graph corresponds to Figure 10 in the manuscript and visualizes the tongue trajectory differences.
par(mfrow=c(4,3))
plotSmooths(model,xvar='Time.normWord',catvar='GroupSensorAxis',
catlevels=c('TerApel.T1.P','Ubbergen.T1.P'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(0.05,1),legendPos='topleft',main='T1 posterior position',
xlab='Time (normalized)',ylab='Posterior position',
legendlabels=c('Ter Apel','Ubbergen'),legendtitle='Group',
colors=c('cadetblue3','tomato4'),cexPoints=0.5,showPoints=F)
plotSmooths(model,xvar='Time.normWord',catvar='GroupSensorAxis',
catlevels=c('TerApel.T2.P','Ubbergen.T2.P'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(0.05,1),legendPos='topleft',main='T2 posterior position',
xlab='Time (normalized)',ylab='Posterior position',
legendlabels=c('Ter Apel','Ubbergen'),legendtitle='Group',
colors=c('cadetblue3','tomato4'),cexPoints=0.5,showPoints=F)
plotSmooths(model,xvar='Time.normWord',catvar='GroupSensorAxis',
catlevels=c('TerApel.T3.P','Ubbergen.T3.P'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(0.05,1),legendPos='bottomleft',main='T3 posterior position',
xlab='Time (normalized)',ylab='Posterior position',
legendlabels=c('Ter Apel','Ubbergen'),legendtitle='Group',
colors=c('cadetblue3','tomato4'),cexPoints=0.5,showPoints=F)
plotDiff(model, xvar="Time.normWord", catvar='GroupSensorAxis',
catlevels=c('Ubbergen.T1.P','TerApel.T1.P'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(-0.13,0.13),main='T1 posterior position Ter Apel vs. Ubbergen)',
ylab='Posterior position difference',xlab='Time (normalized)')
plotDiff(model, xvar="Time.normWord", catvar='GroupSensorAxis',
catlevels=c('Ubbergen.T2.P','TerApel.T2.P'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(-0.13,0.13),main='T2 posterior position Ter Apel vs. Ubbergen)',
ylab='Posterior position difference',xlab='Time (normalized)')
plotDiff(model, xvar="Time.normWord", catvar='GroupSensorAxis',
catlevels=c('Ubbergen.T3.P','TerApel.T3.P'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(-0.13,0.13),main='T3 posterior position Ter Apel vs. Ubbergen)',
ylab='Posterior position difference',xlab='Time (normalized)')
plotSmooths(model,xvar='Time.normWord',catvar='GroupSensorAxis',
catlevels=c('TerApel.T1.H','Ubbergen.T1.H'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(0.05,1),legendPos='topleft',main='T1 height',
xlab='Time (normalized)',ylab='Height',
legendlabels=c('Ter Apel','Ubbergen'),legendtitle='Group',
colors=c('cadetblue3','tomato4'),cexPoints=0.5,showPoints=F)
plotSmooths(model,xvar='Time.normWord',catvar='GroupSensorAxis',
catlevels=c('TerApel.T2.H','Ubbergen.T2.H'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(0.05,1),legendPos='topleft',main='T2 height',
xlab='Time (normalized)',ylab='Height',
legendlabels=c('Ter Apel','Ubbergen'),legendtitle='Group',
colors=c('cadetblue3','tomato4'),cexPoints=0.5,showPoints=F)
plotSmooths(model,xvar='Time.normWord',catvar='GroupSensorAxis',
catlevels=c('TerApel.T3.H','Ubbergen.T3.H'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(0.05,1),legendPos='bottomleft',main='T3 height',
xlab='Time (normalized)',ylab='Height',
legendlabels=c('Ter Apel','Ubbergen'),legendtitle='Group',
colors=c('cadetblue3','tomato4'),cexPoints=0.5,showPoints=F)
plotDiff(model, xvar="Time.normWord", catvar='GroupSensorAxis',
catlevels=c('Ubbergen.T1.H','TerApel.T1.H'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(-0.13,0.13),main='T1 height Ter Apel vs. Ubbergen)',
ylab='Height difference',xlab='Time (normalized)')
plotDiff(model, xvar="Time.normWord", catvar='GroupSensorAxis',
catlevels=c('Ubbergen.T2.H','TerApel.T2.H'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(-0.13,0.13),main='T2 height Ter Apel vs. Ubbergen)',
ylab='Height difference',xlab='Time (normalized)')
plotDiff(model, xvar="Time.normWord", catvar='GroupSensorAxis',
catlevels=c('Ubbergen.T3.H','TerApel.T3.H'),
dropRanef=c('SpeakerSensorAxis','WordSensorAxis'),
ylim=c(-0.13,0.13),main='T3 height Ter Apel vs. Ubbergen)',
ylab='Height difference',xlab='Time (normalized)')
if (!file.exists('modelCVC-binary-1.8.7.rda')) {
download.file('http://www.let.rug.nl/wieling/DiaArt/modelCVC-binary-1.8.7.rda',
'modelCVC-binary-1.8.7.rda') # 690 MB
}
load('modelCVC-binary-1.8.7.rda')
smrybin # show summary assessing significant differences
##
## Family: gaussian
## Link function: identity
##
## Formula:
## Position.norm ~ s(Time.normWord, by = SensorAxis, k = 20) + SensorAxis +
## s(Time.normWord, by = IsTA.T1.P, k = 20) + s(Time.normWord,
## by = IsTA.T1.H, k = 20) + s(Time.normWord, by = IsTA.T2.P,
## k = 20) + s(Time.normWord, by = IsTA.T2.H, k = 20) + s(Time.normWord,
## by = IsTA.T3.P, k = 20) + s(Time.normWord, by = IsTA.T3.H,
## k = 20) + s(Time.normWord, SpeakerSensorAxis, bs = "fs",
## m = 1) + s(Time.normWord, WordSensorAxis, bs = "fs", m = 1)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.66551 0.01785 37.291 < 2e-16 ***
## SensorAxisT2.P -0.23699 0.02525 -9.387 < 2e-16 ***
## SensorAxisT1.P -0.45210 0.02527 -17.889 < 2e-16 ***
## SensorAxisT3.H 0.07949 0.02523 3.150 0.001633 **
## SensorAxisT2.H -0.08800 0.02530 -3.479 0.000504 ***
## SensorAxisT1.H -0.28318 0.02531 -11.188 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(Time.normWord):SensorAxisT3.P 13.871 14.99 4.000 2.58e-07 ***
## s(Time.normWord):SensorAxisT2.P 14.147 15.22 4.727 3.96e-09 ***
## s(Time.normWord):SensorAxisT1.P 15.057 15.96 6.040 1.89e-13 ***
## s(Time.normWord):SensorAxisT3.H 15.267 16.08 12.028 < 2e-16 ***
## s(Time.normWord):SensorAxisT2.H 16.462 17.01 24.052 < 2e-16 ***
## s(Time.normWord):SensorAxisT1.H 17.154 17.57 22.274 < 2e-16 ***
## s(Time.normWord):IsTA.T1.P 13.439 15.70 3.770 6.52e-07 ***
## s(Time.normWord):IsTA.T1.H 13.785 16.08 2.827 0.000128 ***
## s(Time.normWord):IsTA.T2.P 11.212 13.56 3.474 1.48e-05 ***
## s(Time.normWord):IsTA.T2.H 11.795 14.24 3.833 1.27e-06 ***
## s(Time.normWord):IsTA.T3.P 9.999 12.28 2.603 0.001625 **
## s(Time.normWord):IsTA.T3.H 2.000 2.00 0.928 0.395547
## s(Time.normWord,SpeakerSensorAxis) 1910.877 2148.00 28.236 < 2e-16 ***
## s(Time.normWord,WordSensorAxis) 1408.806 1452.00 144.696 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.922 Deviance explained = 92.3%
## fREML = -1.4992e+06 Scale est. = 0.0019074 n = 515454
In this section, the results are determined on the basis of LDA, for comparison with the results on the basis of generalized additive modeling.
# extract relevant columns from dialect data set
ld = dia[!is.na(dia$Position.norm),c("Speaker","Group","Word","WordNr","Segment",
"SensorAxis","Time.normWord","Position.norm")]
# convert to wide format, with separate columns for each sensor/axis
wide = dcast(ld, Speaker + Group + Word + WordNr + Segment + Time.normWord ~ SensorAxis,
value.var = "Position.norm")
sgmts = c('a','i','o','k','t')
for (sgm in sgmts) {
writeLines(paste("###### SEGMENT:",sgm))
cat('\n### LDA output\n\n')
# Focus on consistent target segment (rows with missing values are excluded)
seg = droplevels(na.omit(subset(wide,Segment==sgm)))
# LDA: Note that the group means generally show directional distances consistent
# with the GAMs results: TerApel more back (higher posteriority) and lower than Ubbergen
print(seg.lda <- lda(Group ~ T1.P + T2.P + T3.P + T1.H + T2.H + T3.H, data=seg,
na.action=na.omit))
cat('\n\n### Test if group means are different\n\n')
# Test if group means are different
seg.m = manova(cbind(seg$T1.P, seg$T2.P, seg$T3.P, seg$T1.H, seg$T2.H, seg$T3.H) ~
seg$Group)
print(summary(seg.m))
# Test whether the predicted classes are selected above chance
seg.p = predict(seg.lda,seg[,7:12])
k = (seg.p$class==seg$Group)
cat('\n\n### Confusion matrix\n')
print(table(seg.p$class,seg$Group))
cat('\n\n### Classification performance\n')
print(binom.test(sum(k,na.rm=T),length(k)))
cat('\n')
}
## ###### SEGMENT: a
##
## ### LDA output
##
## Call:
## lda(Group ~ T1.P + T2.P + T3.P + T1.H + T2.H + T3.H, data = seg,
## na.action = na.omit)
##
## Prior probabilities of groups:
## TerApel Ubbergen
## 0.6766776 0.3233224
##
## Group means:
## T1.P T2.P T3.P T1.H T2.H T3.H
## TerApel 0.2891709 0.5514214 0.7695527 0.2797721 0.4279887 0.5479407
## Ubbergen 0.2954047 0.5252838 0.7533199 0.2742079 0.4712480 0.6198329
##
## Coefficients of linear discriminants:
## LD1
## T1.P 11.193650
## T2.P -19.852718
## T3.P 7.671764
## T1.H -2.154486
## T2.H 0.672566
## T3.H 4.005326
##
##
## ### Test if group means are different
##
## Df Pillai approx F num Df den Df Pr(>F)
## seg$Group 1 0.12102 182.45 6 7951 < 2.2e-16 ***
## Residuals 7956
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ### Confusion matrix
##
## TerApel Ubbergen
## TerApel 4791 1919
## Ubbergen 594 654
##
##
## ### Classification performance
##
## Exact binomial test
##
## data: sum(k, na.rm = T) and length(k)
## number of successes = 5445, number of trials = 7958, p-value <
## 2.2e-16
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.6738756 0.6944236
## sample estimates:
## probability of success
## 0.6842171
##
##
## ###### SEGMENT: i
##
## ### LDA output
##
## Call:
## lda(Group ~ T1.P + T2.P + T3.P + T1.H + T2.H + T3.H, data = seg,
## na.action = na.omit)
##
## Prior probabilities of groups:
## TerApel Ubbergen
## 0.4709135 0.5290865
##
## Group means:
## T1.P T2.P T3.P T1.H T2.H T3.H
## TerApel 0.1943743 0.4246195 0.6464775 0.4328305 0.6848339 0.8068518
## Ubbergen 0.1729227 0.3770877 0.6174808 0.4786159 0.7157095 0.8830820
##
## Coefficients of linear discriminants:
## LD1
## T1.P 3.541838
## T2.P -13.432849
## T3.P 8.315477
## T1.H 7.420194
## T2.H -8.367080
## T3.H 9.457053
##
##
## ### Test if group means are different
##
## Df Pillai approx F num Df den Df Pr(>F)
## seg$Group 1 0.23936 435.99 6 8313 < 2.2e-16 ***
## Residuals 8318
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ### Confusion matrix
##
## TerApel Ubbergen
## TerApel 2456 947
## Ubbergen 1462 3455
##
##
## ### Classification performance
##
## Exact binomial test
##
## data: sum(k, na.rm = T) and length(k)
## number of successes = 5911, number of trials = 8320, p-value <
## 2.2e-16
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.7005784 0.7201874
## sample estimates:
## probability of success
## 0.7104567
##
##
## ###### SEGMENT: o
##
## ### LDA output
##
## Call:
## lda(Group ~ T1.P + T2.P + T3.P + T1.H + T2.H + T3.H, data = seg,
## na.action = na.omit)
##
## Prior probabilities of groups:
## TerApel Ubbergen
## 0.3710556 0.6289444
##
## Group means:
## T1.P T2.P T3.P T1.H T2.H T3.H
## TerApel 0.3582128 0.6136622 0.8225727 0.1937995 0.3920918 0.5667555
## Ubbergen 0.3653927 0.5874748 0.8125403 0.2585737 0.4651559 0.6676979
##
## Coefficients of linear discriminants:
## LD1
## T1.P 11.185945
## T2.P -15.845352
## T3.P 4.833566
## T1.H 5.429878
## T2.H -5.174782
## T3.H 7.415979
##
##
## ### Test if group means are different
##
## Df Pillai approx F num Df den Df Pr(>F)
## seg$Group 1 0.24185 419.15 6 7884 < 2.2e-16 ***
## Residuals 7889
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ### Confusion matrix
##
## TerApel Ubbergen
## TerApel 1515 725
## Ubbergen 1413 4238
##
##
## ### Classification performance
##
## Exact binomial test
##
## data: sum(k, na.rm = T) and length(k)
## number of successes = 5753, number of trials = 7891, p-value <
## 2.2e-16
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.7191054 0.7388420
## sample estimates:
## probability of success
## 0.7290584
##
##
## ###### SEGMENT: k
##
## ### LDA output
##
## Call:
## lda(Group ~ T1.P + T2.P + T3.P + T1.H + T2.H + T3.H, data = seg,
## na.action = na.omit)
##
## Prior probabilities of groups:
## TerApel Ubbergen
## 0.4399054 0.5600946
##
## Group means:
## T1.P T2.P T3.P T1.H T2.H T3.H
## TerApel 0.2389757 0.4551985 0.6770988 0.3345936 0.6122836 0.8075722
## Ubbergen 0.2044630 0.3963662 0.6234869 0.3794513 0.6246470 0.8710793
##
## Coefficients of linear discriminants:
## LD1
## T1.P -0.3277397
## T2.P -0.3929488
## T3.P -3.0346310
## T1.H 7.3502104
## T2.H -9.3643360
## T3.H 12.4287216
##
##
## ### Test if group means are different
##
## Df Pillai approx F num Df den Df Pr(>F)
## seg$Group 1 0.26206 374.84 6 6333 < 2.2e-16 ***
## Residuals 6338
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ### Confusion matrix
##
## TerApel Ubbergen
## TerApel 1673 604
## Ubbergen 1116 2947
##
##
## ### Classification performance
##
## Exact binomial test
##
## data: sum(k, na.rm = T) and length(k)
## number of successes = 4620, number of trials = 6340, p-value <
## 2.2e-16
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.7175796 0.7396229
## sample estimates:
## probability of success
## 0.7287066
##
##
## ###### SEGMENT: t
##
## ### LDA output
##
## Call:
## lda(Group ~ T1.P + T2.P + T3.P + T1.H + T2.H + T3.H, data = seg,
## na.action = na.omit)
##
## Prior probabilities of groups:
## TerApel Ubbergen
## 0.4636455 0.5363545
##
## Group means:
## T1.P T2.P T3.P T1.H T2.H T3.H
## TerApel 0.2352396 0.4957411 0.7062446 0.5302483 0.6366531 0.6788357
## Ubbergen 0.1336021 0.3765153 0.6251408 0.5443852 0.6692314 0.7242568
##
## Coefficients of linear discriminants:
## LD1
## T1.P -2.898990
## T2.P -14.115635
## T3.P 2.563233
## T1.H 3.975205
## T2.H -1.244822
## T3.H 1.334026
##
##
## ### Test if group means are different
##
## Df Pillai approx F num Df den Df Pr(>F)
## seg$Group 1 0.45873 1834.8 6 12990 < 2.2e-16 ***
## Residuals 12995
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ### Confusion matrix
##
## TerApel Ubbergen
## TerApel 4658 792
## Ubbergen 1368 6179
##
##
## ### Classification performance
##
## Exact binomial test
##
## data: sum(k, na.rm = T) and length(k)
## number of successes = 10837, number of trials = 12997, p-value <
## 2.2e-16
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.8272950 0.8401708
## sample estimates:
## probability of success
## 0.8338078
# extract relevant columns from dialect data set
ld = cvc[!is.na(cvc$Position.norm),c("Speaker","Group","Word","WordNr","Segment",
"SensorAxis","Time.normWord","Position.norm")]
# convert to wide format, with separate columns for each sensor/axis
wide = dcast(ld, Speaker + Group + Word + WordNr + Segment + Time.normWord ~ SensorAxis,
value.var = "Position.norm")
sgmts = c('i','a','o','k','t')
for (sgm in sgmts) {
writeLines(paste("###### SEGMENT:",sgm))
cat('\n### LDA output\n\n')
# Focus on consistent target segment (rows with missing values are excluded)
seg = droplevels(na.omit(subset(wide,Segment==sgm)))
# LDA: Note that the group means generally show directional distances consistent
# with the GAMs results: TerApel more back (higher posteriority) and lower than Ubbergen
print(seg.lda <- lda(Group ~ T1.P + T2.P + T3.P + T1.H + T2.H + T3.H, data=seg,
na.action=na.omit))
cat('\n\n### Test if group means are different\n\n')
# Test if group means are different
seg.m = manova(cbind(seg$T1.P, seg$T2.P, seg$T3.P, seg$T1.H, seg$T2.H, seg$T3.H) ~
seg$Group)
print(summary(seg.m))
# Test whether the predicted classes are selected above chance
seg.p = predict(seg.lda,seg[,7:12])
k = (seg.p$class==seg$Group)
cat('\n\n### Confusion matrix\n')
print(table(seg.p$class,seg$Group))
cat('\n\n### Classification performance\n')
print(binom.test(sum(k,na.rm=T),length(k)))
cat('\n')
}
## ###### SEGMENT: i
##
## ### LDA output
##
## Call:
## lda(Group ~ T1.P + T2.P + T3.P + T1.H + T2.H + T3.H, data = seg,
## na.action = na.omit)
##
## Prior probabilities of groups:
## TerApel Ubbergen
## 0.4507191 0.5492809
##
## Group means:
## T1.P T2.P T3.P T1.H T2.H T3.H
## TerApel 0.1839142 0.3964286 0.6282997 0.4243656 0.7254252 0.8612279
## Ubbergen 0.1469226 0.3375954 0.5879307 0.4483011 0.7097348 0.9161968
##
## Coefficients of linear discriminants:
## LD1
## T1.P -2.580868
## T2.P -8.799849
## T3.P 4.289847
## T1.H 7.263122
## T2.H -9.097211
## T3.H 10.249383
##
##
## ### Test if group means are different
##
## Df Pillai approx F num Df den Df Pr(>F)
## seg$Group 1 0.30142 544.53 6 7572 < 2.2e-16 ***
## Residuals 7577
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ### Confusion matrix
##
## TerApel Ubbergen
## TerApel 2142 901
## Ubbergen 1274 3262
##
##
## ### Classification performance
##
## Exact binomial test
##
## data: sum(k, na.rm = T) and length(k)
## number of successes = 5404, number of trials = 7579, p-value <
## 2.2e-16
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.7026923 0.7231893
## sample estimates:
## probability of success
## 0.7130228
##
##
## ###### SEGMENT: a
##
## ### LDA output
##
## Call:
## lda(Group ~ T1.P + T2.P + T3.P + T1.H + T2.H + T3.H, data = seg,
## na.action = na.omit)
##
## Prior probabilities of groups:
## TerApel Ubbergen
## 0.5013864 0.4986136
##
## Group means:
## T1.P T2.P T3.P T1.H T2.H T3.H
## TerApel 0.2671566 0.5022219 0.7182539 0.2818187 0.5024803 0.6200569
## Ubbergen 0.2438843 0.4696303 0.7092268 0.2807898 0.4323438 0.5824709
##
## Coefficients of linear discriminants:
## LD1
## T1.P 0.1357679
## T2.P -14.0853354
## T3.P 9.4084456
## T1.H 7.4292286
## T2.H -13.9236000
## T3.H 4.8903032
##
##
## ### Test if group means are different
##
## Df Pillai approx F num Df den Df Pr(>F)
## seg$Group 1 0.25047 782.97 6 14058 < 2.2e-16 ***
## Residuals 14063
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ### Confusion matrix
##
## TerApel Ubbergen
## TerApel 5209 2223
## Ubbergen 1843 4790
##
##
## ### Classification performance
##
## Exact binomial test
##
## data: sum(k, na.rm = T) and length(k)
## number of successes = 9999, number of trials = 14065, p-value <
## 2.2e-16
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.7033429 0.7183970
## sample estimates:
## probability of success
## 0.7109136
##
##
## ###### SEGMENT: o
##
## ### LDA output
##
## Call:
## lda(Group ~ T1.P + T2.P + T3.P + T1.H + T2.H + T3.H, data = seg,
## na.action = na.omit)
##
## Prior probabilities of groups:
## TerApel Ubbergen
## 0.5786739 0.4213261
##
## Group means:
## T1.P T2.P T3.P T1.H T2.H T3.H
## TerApel 0.3836173 0.6127728 0.8218935 0.2471225 0.5014741 0.6700617
## Ubbergen 0.3509027 0.5603653 0.7790775 0.2853263 0.5209739 0.7160739
##
## Coefficients of linear discriminants:
## LD1
## T1.P 8.4741358
## T2.P -16.7488646
## T3.P 0.1832691
## T1.H 3.7233562
## T2.H -5.6484760
## T3.H 5.3834986
##
##
## ### Test if group means are different
##
## Df Pillai approx F num Df den Df Pr(>F)
## seg$Group 1 0.15658 273.7 6 8846 < 2.2e-16 ***
## Residuals 8851
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ### Confusion matrix
##
## TerApel Ubbergen
## TerApel 4074 1871
## Ubbergen 1049 1859
##
##
## ### Classification performance
##
## Exact binomial test
##
## data: sum(k, na.rm = T) and length(k)
## number of successes = 5933, number of trials = 8853, p-value <
## 2.2e-16
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.6602635 0.6799610
## sample estimates:
## probability of success
## 0.6701683
##
##
## ###### SEGMENT: k
##
## ### LDA output
##
## Call:
## lda(Group ~ T1.P + T2.P + T3.P + T1.H + T2.H + T3.H, data = seg,
## na.action = na.omit)
##
## Prior probabilities of groups:
## TerApel Ubbergen
## 0.5167247 0.4832753
##
## Group means:
## T1.P T2.P T3.P T1.H T2.H T3.H
## TerApel 0.2434248 0.4484329 0.6751671 0.3166901 0.6208556 0.8212812
## Ubbergen 0.2109022 0.4009014 0.6302337 0.3703093 0.6244048 0.8613941
##
## Coefficients of linear discriminants:
## LD1
## T1.P 0.5104503
## T2.P -1.9469977
## T3.P -1.8610251
## T1.H 9.6625347
## T2.H -9.6531470
## T3.H 8.4316152
##
##
## ### Test if group means are different
##
## Df Pillai approx F num Df den Df Pr(>F)
## seg$Group 1 0.19397 644.83 6 16077 < 2.2e-16 ***
## Residuals 16082
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ### Confusion matrix
##
## TerApel Ubbergen
## TerApel 5918 2656
## Ubbergen 2393 5117
##
##
## ### Classification performance
##
## Exact binomial test
##
## data: sum(k, na.rm = T) and length(k)
## number of successes = 11035, number of trials = 16084, p-value <
## 2.2e-16
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.6788492 0.6932546
## sample estimates:
## probability of success
## 0.6860856
##
##
## ###### SEGMENT: t
##
## ### LDA output
##
## Call:
## lda(Group ~ T1.P + T2.P + T3.P + T1.H + T2.H + T3.H, data = seg,
## na.action = na.omit)
##
## Prior probabilities of groups:
## TerApel Ubbergen
## 0.533519 0.466481
##
## Group means:
## T1.P T2.P T3.P T1.H T2.H T3.H
## TerApel 0.2288785 0.486989 0.7036794 0.5242628 0.6647319 0.7172740
## Ubbergen 0.1277324 0.368123 0.6161714 0.5246833 0.6693391 0.7410292
##
## Coefficients of linear discriminants:
## LD1
## T1.P -2.8559369
## T2.P -12.5088756
## T3.P 0.3429155
## T1.H 4.0766302
## T2.H -4.2952272
## T3.H 0.8676794
##
##
## ### Test if group means are different
##
## Df Pillai approx F num Df den Df Pr(>F)
## seg$Group 1 0.43207 2092.9 6 16506 < 2.2e-16 ***
## Residuals 16511
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ### Confusion matrix
##
## TerApel Ubbergen
## TerApel 7122 1482
## Ubbergen 1688 6221
##
##
## ### Classification performance
##
## Exact binomial test
##
## data: sum(k, na.rm = T) and length(k)
## number of successes = 13343, number of trials = 16513, p-value <
## 2.2e-16
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.8019386 0.8140128
## sample estimates:
## probability of success
## 0.80803
In this section, the articulatory results are compared to the acoustic results.
# correlations of F1 with height
(t1h = with(subset(dia,SensorAxis=='T1.H'),cor(Position.norm,F1.norm,use='pair')))
## [1] -0.1816255
(t2h = with(subset(dia,SensorAxis=='T2.H'),cor(Position.norm,F1.norm,use='pair')))
## [1] -0.282886
(t3h = with(subset(dia,SensorAxis=='T3.H'),cor(Position.norm,F1.norm,use='pair')))
## [1] -0.3364694
mean(c(t1h,t2h,t3h))
## [1] -0.2669937
# correlations of F2 with posterior position
(t1p = with(subset(dia,SensorAxis=='T1.P'),cor(Position.norm,F2.norm,use='pair')))
## [1] -0.4720476
(t2p = with(subset(dia,SensorAxis=='T2.P'),cor(Position.norm,F2.norm,use='pair')))
## [1] -0.5302865
(t3p = with(subset(dia,SensorAxis=='T3.P'),cor(Position.norm,F2.norm,use='pair')))
## [1] -0.5415716
mean(c(t1p,t2p,t3p))
## [1] -0.5146352
# all correlations are significant, example for weakest correlation:
summary(lm(Position.norm ~ F1.norm, data=dia[dia$SensorAxis=='T1.H',]))
##
## Call:
## lm(formula = Position.norm ~ F1.norm, data = dia[dia$SensorAxis ==
## "T1.H", ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.38135 -0.08493 0.00566 0.08828 0.53081
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3541624 0.0004041 876.52 <2e-16 ***
## F1.norm -0.0259020 0.0004320 -59.95 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1311 on 105372 degrees of freedom
## (152850 observations deleted due to missingness)
## Multiple R-squared: 0.03299, Adjusted R-squared: 0.03298
## F-statistic: 3595 on 1 and 105372 DF, p-value: < 2.2e-16
The following visualization corresponds to Figure 11 in the manuscript.
sensoraxis = c('T1.H','T2.H','T3.H','T1.P','T2.P','T3.P')
par(mfrow=c(2,3))
for (sa in sensoraxis) {
# select sensor and axis
tmp = dia[dia$SensorAxis==sa,]
# labels, etc. dependent on sensoraxis
if (sa %in% c('T1.P','T2.P','T3.P')) {
xl = 'Posterior position (normalized)'
yl = 'F2 (normalized)'
tmp$F.norm = tmp$F2.norm # F2 corresponds with posterior position
} else {
reg1 <- lm(tmp$F1.norm ~ tmp$Position.norm)
xl = 'Height (normalized)'
yl = 'F1 (normalized)'
tmp$F.norm = tmp$F1.norm # F1 corresponds with height
}
# obtain regression line
reg1 <- lm(tmp$F.norm ~ tmp$Position.norm)
# scatterplot of position and F1 valuess
plot(tmp$Position.norm, tmp$F.norm, xlab=xl, ylab=yl,
main=paste('Dialect words:',substr(sa,1,2)), col='grey', cex.lab=2,
cex.main=2,cex.lab=1.5,xlim=c(0,1),ylim=c(-5,5))
abline(reg1,lty=2)
}
# select formants (not separated per axis and sensor)
formantsDia = unique(dia[,c("Speaker","Group","Word","WordNr","Segment",
"SegmentNr","Time.normWord","F1.norm","F2.norm")])
# compare to Ubbergen: set as reference level
formantsDia$Group = relevel(formantsDia$Group,'Ubbergen')
# mixed model shows lower F1 in Ter Apel for dialect words
summary(lmer(F1.norm ~ Group + (1|Speaker),data=formantsDia))
## Linear mixed model fit by REML ['lmerMod']
## Formula: F1.norm ~ Group + (1 | Speaker)
## Data: formantsDia
##
## REML criterion at convergence: 285654.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4984 -0.6981 -0.1521 0.6524 7.1421
##
## Random effects:
## Groups Name Variance Std.Dev.
## Speaker (Intercept) 0.0007906 0.02812
## Residual 0.8716251 0.93361
## Number of obs: 105756, groups: Speaker, 40
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.002596 0.007573 0.343
## GroupTerApel -0.077862 0.010616 -7.335
##
## Correlation of Fixed Effects:
## (Intr)
## GroupTerApl -0.713
# mixed model shows lower F2 in Ter Apel for dialect words
summary(lmer(F2.norm ~ Group + (1|Speaker),data=formantsDia))
## Linear mixed model fit by REML ['lmerMod']
## Formula: F2.norm ~ Group + (1 | Speaker)
## Data: formantsDia
##
## REML criterion at convergence: 301038.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.4810 -0.8317 0.0594 0.6663 4.4686
##
## Random effects:
## Groups Name Variance Std.Dev.
## Speaker (Intercept) 0.0009088 0.03015
## Residual 0.9958372 0.99792
## Number of obs: 106213, groups: Speaker, 40
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.046534 0.008098 5.747
## GroupTerApel -0.063604 0.011365 -5.596
##
## Correlation of Fixed Effects:
## (Intr)
## GroupTerApl -0.712
# correlations of F1 with height
(t1h = with(subset(cvc,SensorAxis=='T1.H'),cor(Position.norm,F1.norm,use='pair')))
## [1] -0.3380322
(t2h = with(subset(cvc,SensorAxis=='T2.H'),cor(Position.norm,F1.norm,use='pair')))
## [1] -0.504336
(t3h = with(subset(cvc,SensorAxis=='T3.H'),cor(Position.norm,F1.norm,use='pair')))
## [1] -0.6216022
mean(c(t1h,t2h,t3h))
## [1] -0.4879901
# correlations of F2 with posterior position
(t1p = with(subset(cvc,SensorAxis=='T1.P'),cor(Position.norm,F2.norm,use='pair')))
## [1] -0.640532
(t2p = with(subset(cvc,SensorAxis=='T2.P'),cor(Position.norm,F2.norm,use='pair')))
## [1] -0.6926906
(t3p = with(subset(cvc,SensorAxis=='T3.P'),cor(Position.norm,F2.norm,use='pair')))
## [1] -0.678066
mean(c(t1p,t2p,t3p))
## [1] -0.6704295
# all correlations are significant, example for weakest correlation:
summary(lm(Position.norm ~ F1.norm, data=cvc[cvc$SensorAxis=='T1.H',]))
##
## Call:
## lm(formula = Position.norm ~ F1.norm, data = cvc[cvc$SensorAxis ==
## "T1.H", ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.34468 -0.09077 -0.00201 0.08314 0.51842
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3083326 0.0007122 432.95 <2e-16 ***
## F1.norm -0.0388728 0.0006012 -64.66 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1276 on 32404 degrees of freedom
## (53656 observations deleted due to missingness)
## Multiple R-squared: 0.1143, Adjusted R-squared: 0.1142
## F-statistic: 4180 on 1 and 32404 DF, p-value: < 2.2e-16
The following visualization corresponds to Figure 12 in the manuscript.
sensoraxis = c('T1.H','T2.H','T3.H','T1.P','T2.P','T3.P')
par(mfrow=c(2,3))
for (sa in sensoraxis) {
# select sensor and axis
tmp = cvc[cvc$SensorAxis==sa,]
# labels, etc. dependent on sensoraxis
if (sa %in% c('T1.P','T2.P','T3.P')) {
xl = 'Posterior position (normalized)'
yl = 'F2 (normalized)'
tmp$F.norm = tmp$F2.norm # F2 corresponds with posterior position
} else {
reg1 <- lm(tmp$F1.norm ~ tmp$Position.norm)
xl = 'Height (normalized)'
yl = 'F1 (normalized)'
tmp$F.norm = tmp$F1.norm # F1 corresponds with height
}
# obtain regression line
reg1 <- lm(tmp$F.norm ~ tmp$Position.norm)
# scatterplot of position and F1 valuess
plot(tmp$Position.norm, tmp$F.norm, xlab=xl, ylab=yl,
main=paste('CVC sequences:',substr(sa,1,2)), col='grey', cex.lab=2,
cex.main=2,cex.lab=1.5,xlim=c(0,1),ylim=c(-5,5))
abline(reg1,lty=2)
}
# select formants (not separated per axis and sensor)
formantsCVC = unique(cvc[,c("Speaker","Group","Word","WordNr","Segment",
"SegmentNr","Time.normWord","F1.norm","F2.norm")])
# compare to Ubbergen: set as reference level
formantsCVC$Group = relevel(formantsCVC$Group,'Ubbergen')
# mixed model shows higher F1 in Ter Apel for dialect words
summary(lmer(F1.norm ~ Group + (1|Speaker),data=formantsCVC))
## Linear mixed model fit by REML ['lmerMod']
## Formula: F1.norm ~ Group + (1 | Speaker)
## Data: formantsCVC
##
## REML criterion at convergence: 102914
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.6729 -0.8326 -0.1990 0.9309 3.4232
##
## Random effects:
## Groups Name Variance Std.Dev.
## Speaker (Intercept) 0.009558 0.09777
## Residual 1.369114 1.17009
## Number of obs: 32623, groups: Speaker, 40
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.007803 0.024227 -0.322
## GroupTerApel 0.239976 0.033595 7.143
##
## Correlation of Fixed Effects:
## (Intr)
## GroupTerApl -0.721
# mixed model shows higher F2 in Ter Apel for dialect words
summary(lmer(F2.norm ~ Group + (1|Speaker),data=formantsCVC))
## Linear mixed model fit by REML ['lmerMod']
## Formula: F2.norm ~ Group + (1 | Speaker)
## Data: formantsCVC
##
## REML criterion at convergence: 89202.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.5068 -0.8843 -0.0824 0.5032 3.4622
##
## Random effects:
## Groups Name Variance Std.Dev.
## Speaker (Intercept) 0.01335 0.1156
## Residual 0.97861 0.9892
## Number of obs: 31636, groups: Speaker, 40
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.16878 0.02766 -6.102
## GroupTerApel 0.22049 0.03827 5.762
##
## Correlation of Fixed Effects:
## (Intr)
## GroupTerApl -0.723
To replicate the analysis presented above, you can just copy the following lines to the most recent version of R. Please note that you first need to install Pandoc. Warning: when replicating this analysis, several data sets and fitted models will need to be downloaded, amounting to about 8 GB (i.e. 8000 MB).
download.file('http://www.let.rug.nl/wieling/DiaArt/analysis.Rmd', 'analysis.Rmd')
library(rmarkdown)
render('analysis.Rmd') # generates html file with results
browseURL(paste('file://', file.path(getwd(),'analysis.html'), sep='')) # shows result