The present study uses electromagnetic articulography, through which the position of tongue and lips during speech is measured, for the study of dialect variation. By using generalized additive modeling to analyze the articulatory trajectories, we are able to reliably detect aggregate group differences, while simultaneously taking into account the individual variation of dozens of speakers. Our results show that two Dutch dialects show clear differences in their articulatory settings, with generally a more anterior tongue position in the dialect from Ubbergen in the southern half of the Netherlands than in the dialect of Ter Apel in the northern half of the Netherlands. A comparison with formant-based acoustic measurements further reveals that articulography is able to reveal interesting structural articulatory differences between dialects which are not visible when only focusing on the acoustic signal.
Journal: Revised version submitted (July, 2016) to Journal of Phonetics
Preprint: http://www.martijnwieling.nl/files/WielingEtAl-art.pdf
All source data: http://www.let.rug.nl/wieling/DiaArt/SourceData/
Keywords: Articulography; Dialectology; Generalized additive modeling; Articulatory setting
## Generated on: July 21, 2016 - 17:46:43
The following commands load the necessary functions and libraries and show the version information.
# install packages if not yet installed
packages <- c("mgcv","itsadug","lme4","parallel","MASS","reshape2")
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
install.packages(setdiff(packages, rownames(installed.packages())))
}
# load required packages
library(mgcv)
library(itsadug)
library(lme4)
library(parallel)
library(MASS)
library(reshape2)
# custom plotting functions
if (!file.exists('plotArt2D.R')) {
download.file('http://www.let.rug.nl/wieling/DiaArt/plotArt2D.R','plotArt2D.R')
}
source('plotArt2D.R')
# version information
R.version.string
## [1] "R version 3.3.1 (2016-06-21)"
cat(paste('mgcv version:',packageVersion('mgcv')))
## mgcv version: 1.8.12
cat(paste('itsadug version:',packageVersion('itsadug')))
## itsadug version: 2.2
The following shows the columns of the dataset and their explanation.
if (!file.exists('dat.rda')) {
download.file('http://www.let.rug.nl/wieling/DiaArt/dat.rda','dat.rda') # 77 MB
}
load('dat.rda')
dat = droplevels(dat[dat$YearBirth > 1990,],except=colnames(dat)[sapply(dat,is.ordered)]) # exclude older people
The dataset consists of 1734402 rows and 102 columns with the following column names:
colnames(dat)
## [1] "Speaker" "Group"
## [3] "IsTerApel" "Gender"
## [5] "YearBirth" "PlaceBirth"
## [7] "Word" "WordNr"
## [9] "Type" "Segment"
## [11] "SegmentNr" "Sensor"
## [13] "Axis" "SensorAxis"
## [15] "GroupTypeSensorAxis" "SpeakerSensorAxis"
## [17] "SpeakerTypeSensorAxis" "WordSensorAxis"
## [19] "WordGroupSensorAxis" "IsTA.T1.P"
## [21] "IsTA.T1.H" "IsTA.T2.P"
## [23] "IsTA.T2.H" "IsTA.T3.P"
## [25] "IsTA.T3.H" "IsCVC.T1.P"
## [27] "IsCVC.T1.H" "IsCVC.T2.P"
## [29] "IsCVC.T2.H" "IsCVC.T3.P"
## [31] "IsCVC.T3.H" "IsDia.T1.P"
## [33] "IsDia.T1.H" "IsDia.T2.P"
## [35] "IsDia.T2.H" "IsDia.T3.P"
## [37] "IsDia.T3.H" "IsTADia.T1.P"
## [39] "IsTADia.T1.H" "IsTADia.T2.P"
## [41] "IsTADia.T2.H" "IsTADia.T3.P"
## [43] "IsTADia.T3.H" "IsTACVC.T1.P"
## [45] "IsTACVC.T1.H" "IsTACVC.T2.P"
## [47] "IsTACVC.T2.H" "IsTACVC.T3.P"
## [49] "IsTACVC.T3.H" "IsTA.T1.PO"
## [51] "IsTA.T1.HO" "IsTA.T2.PO"
## [53] "IsTA.T2.HO" "IsTA.T3.PO"
## [55] "IsTA.T3.HO" "IsCVC.T1.PO"
## [57] "IsCVC.T1.HO" "IsCVC.T2.PO"
## [59] "IsCVC.T2.HO" "IsCVC.T3.PO"
## [61] "IsCVC.T3.HO" "IsDia.T1.PO"
## [63] "IsDia.T1.HO" "IsDia.T2.PO"
## [65] "IsDia.T2.HO" "IsDia.T3.PO"
## [67] "IsDia.T3.HO" "IsTACVC.T1.PO"
## [69] "IsTACVC.T1.HO" "IsTACVC.T2.PO"
## [71] "IsTACVC.T2.HO" "IsTACVC.T3.PO"
## [73] "IsTACVC.T3.HO" "IsTADia.T1.PO"
## [75] "IsTADia.T1.HO" "IsTADia.T2.PO"
## [77] "IsTADia.T2.HO" "IsTADia.T3.PO"
## [79] "IsTADia.T3.HO" "Word.start"
## [81] "Segment.start" "RecBlock"
## [83] "TimeInRecBlock" "Time.normWord"
## [85] "Time.normSegment" "Position.norm"
## [87] "RestPosition.norm" "RelPos.norm"
## [89] "Position.raw" "RestPosition.raw"
## [91] "RelPos.raw" "F1"
## [93] "F2" "F1.norm"
## [95] "F2.norm" "F1.man"
## [97] "F2.man" "F1.man.norm"
## [99] "F2.man.norm" "RPDistT1T2.raw"
## [101] "RPDistT2T3.raw" "RPDistT1T3.raw"
The following subsections show some descriptives for the dataset.
subj = unique(dat[,c("Speaker","Group","Gender","YearBirth","RPDistT1T2.raw","RPDistT2T3.raw","RPDistT1T3.raw")])
table(subj$Group,subj$Gender)
##
## F M
## TerApel 6 9
## Ubbergen 2 17
cat(paste('Average year of birth for Ter Apel speakers:',
round(mean(subj[subj$Group=='TerApel',]$YearBirth),2)))
## Average year of birth for Ter Apel speakers: 1996.6
cat(paste('Average year of birth for Ubbergen speakers:',
round(mean(subj[subj$Group=='Ubbergen',]$YearBirth),2)))
## Average year of birth for Ubbergen speakers: 1996.47
cat(paste('Average T1-T3 distance for Ter Apel speakers:',
round(mean(subj[subj$Group=='TerApel',]$RPDistT1T3.raw),1)))
## Average T1-T3 distance for Ter Apel speakers: 23.5
cat(paste('Average year of birth for Ubbergen speakers:',
round(mean(subj[subj$Group=='Ubbergen',]$RPDistT1T3.raw),1)))
## Average year of birth for Ubbergen speakers: 24.2
par(mfrow=c(1,3))
boxplot(RPDistT1T3.raw ~ Group, data=subj, main='Distance T1-T3 (mm.)')
boxplot(RPDistT1T2.raw ~ Group, data=subj, main='Distance T1-T2 (mm.)')
boxplot(RPDistT2T3.raw ~ Group, data=subj, main='Distance T2-T3 (mm.)')
wilcox.test(RPDistT1T3.raw ~ Group, data=subj)
##
## Wilcoxon rank sum test
##
## data: RPDistT1T3.raw by Group
## W = 117, p-value = 0.3908
## alternative hypothesis: true location shift is not equal to 0
The following graph shows the distribution of the sounds (categorized as front, center, back) for the dialect words per group.
# relative proportions
m = matrix(c(0.389,0.271,0.144,0.204,0.111,0.129,0.171,0.164,0.04,0.111,0.144,0.121),nrow=2,ncol=6,byrow=T)
dimnames(m) = list(c("Consonants","Vowels"),c("TA (front)","UB (front)","TA (center)","UB (center)","TA (back)","UB (back)"))
barplot(m, col='white', axes=F, axisnames=F, yaxp=c(0,1,2), las=1,ylim=c(0,0.6))
cols1=c('cadetblue3','tomato4','cadetblue3','tomato4','cadetblue3','tomato4')
cols2=c('cadetblue1','tomato','cadetblue1','tomato','cadetblue1','tomato')
# add coloured bars
for (i in 1:ncol(m)){
xx = m
xx[,-i] <- NA
colnames(xx)[-i] <- NA
barplot(xx,col=c(cols1[i],cols2[i]), add=T, axes=F)
}
legend('topright',c('Consonants (TA)','Vowels (TA)','Consonants (UB)','Vowels (UB)'),fill=c('cadetblue3','cadetblue1','tomato4','tomato'))
axis(2)
box()