1 Abstract

In phonetics, many datasets are encountered which deal with dynamic data collected over time. Examples include diphthongal formant trajectories and articulator trajectories observed using electromagnetic articulography. Traditional approaches for analyzing this type of data generally aggregate data over a certain timespan, or only include measurements at a fixed time point (e.g., formant measurements at the midpoint of a vowel). In this paper, I discuss generalized additive modeling, a non-linear regression method which does not require aggregation or the pre-selection of a fixed time point. Instead, the method is able to identify general patterns over dynamically varying data, while simultaneously accounting for subject and item-related variability. An advantage of this approach is that patterns may be discovered which are hidden when data is aggregated or when a single time point is selected. A corresponding disadvantage is that these analyses are generally more time consuming and complex. This tutorial aims to overcome this disadvantage by providing a hands-on introduction to generalized additive modeling using articulatory trajectories from L1 and L2 speakers of English within the freely available R environment. All data and R code is made available to reproduce the analysis presented in this paper.

Journal: Revised version submitted (January 4, 2018) to Journal of Phonetics

Preprint: http://www.martijnwieling.nl/files/GAM-tutorial-Wieling.pdf

Keywords: Generalized additive modeling, Tutorial, Articulography, Second language acquisition

2 Libraries and functions

The following commands load the necessary functions and libraries and show the version information.

# install packages if not yet installed
packages <- c("mgcv","itsadug","lme4","sp")
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

# install mgcv 1.8-23, if not yet available on CRAN
if (packageVersion("mgcv") < '1.8.23') {
  download.file("http://www.let.rug.nl/wieling/Tutorial/mgcv_1.8-23.tar.gz", "mgcv_1.8-23.tar.gz")
  # to install from source in Windows, ensure Rtools is installed beforehand
  # which is available at https://cran.r-project.org/bin/windows/Rtools/
  install.packages("mgcv_1.8-23.tar.gz", type="source")
}

# load required packages
library(mgcv)
library(itsadug)
library(sp) # for colors which also print well in grayscale
library(lme4)

# version information
R.version.string
# [1] "R version 3.4.3 (2017-11-30)"
cat(paste("mgcv version:",packageVersion("mgcv")))
# mgcv version: 1.8.23
cat(paste("itsadug version:",packageVersion("itsadug")))
# itsadug version: 2.3

3 Dataset

The following shows the columns of the full dataset and their explanation.

if (!file.exists("full.rda")) { 
    download.file("http://www.let.rug.nl/wieling/Tutorial/full.rda","full.rda") # 2 MB
}

load("full.rda")

3.1 Column names

The dataset consists of 126177 rows and 9 columns with the following column names:

str(full)
# 'data.frame': 126177 obs. of  9 variables:
#  $ Speaker: Factor w/ 42 levels "VENI_EN_1","VENI_EN_10",..: 1 1 1 1 1 1 1 1 1 1 ...
#  $ Lang   : Factor w/ 2 levels "EN","NL": 1 1 1 1 1 1 1 1 1 1 ...
#  $ Word   : Factor w/ 20 levels "faith","fate",..: 18 18 18 18 18 18 18 18 18 18 ...
#  $ Sound  : Factor w/ 2 levels "T","TH": 1 1 1 1 1 1 1 1 1 1 ...
#  $ Loc    : Factor w/ 2 levels "Final","Init": 2 2 2 2 2 2 2 2 2 2 ...
#  $ Trial  : int  1 1 1 1 1 1 1 1 1 1 ...
#  $ Time   : num  0 0.0161 0.0323 0.0484 0.0645 ...
#  $ Pos    : num  -0.392 -0.44 -0.44 -0.503 -0.513 ...
#  $ Pos01  : num  0.434 0.425 0.425 0.412 0.41 ...

3.2 Data description

  • Speaker - ID of the speaker
  • Lang - Native language of the speaker ("NL" for Dutch, or "EN" for * English)
  • Word - The label of the word
  • Sound - The sound contrast ("TH" for words with the dental fricative, "T" for words with the stop)
  • Loc - The location where in the word the sound contrasts occurs ("Init" when it occurs at the beginning of the word or "Final" when it occurs at the back of the word)
  • Trial - The trial number of the word for each speaker
  • Time - The normalized (between 0: beginning of the word, to 1: end of the word)
  • Pos - The standardized (mean 0, standard deviation 1) position for each speaker of the T1 sensor in the anterior-posterior direction (higher values, more anterior)
  • Pos01 - The normalized (1: most anterior, 0: most posterior) position for each speaker of the T1 sensor

3.3 Example data

head(full)
#     Speaker Lang Word Sound  Loc Trial   Time    Pos Pos01
# 1 VENI_EN_1   EN tick     T Init     1 0.0000 -0.392 0.434
# 2 VENI_EN_1   EN tick     T Init     1 0.0161 -0.440 0.425
# 3 VENI_EN_1   EN tick     T Init     1 0.0323 -0.440 0.425
# 4 VENI_EN_1   EN tick     T Init     1 0.0484 -0.503 0.412
# 5 VENI_EN_1   EN tick     T Init     1 0.0645 -0.513 0.410
# 6 VENI_EN_1   EN tick     T Init     1 0.0806 -0.677 0.378

4 Visualizing basis functions

# Summary:
#   * Time : numeric predictor; with 100 values ranging from 0.000000 to 1.000000.