Dialectology: Aggregate Dialectal Variation

Instructor: John Nerbonne (course under development)
Course Number: LSA.107
Mon. & Wed. 10:10-11:50, June 27-July 13
2005 Linguistics Institute (Harvard/MIT)

2005 Linguistics Institute Course, from left, John Nerbonne, Jonathan Gajdos, Kari Hiltula, Claire Insel, Thea Park, Nynke de Haas, Anne Ribbert, Michiel Verhagen, Rachel Utain-Evans, Holman Tse, Griet Coupe, Pamela Rutecki, Eric Mayfield, Joseph ?, Kotoe Tashiro, ? (click on photo to enlarge).

Announcements 2005


This course will focus on what Hans Goebl has called the "linguistic management of space," how language variation is structured geographically. We shall begin with basic ideas from categorical data analysis which we'll apply to lexical and syntactic data, then examine a technique from computational linguistics (edit distance or Levenshtein distance) for the analysis of sequences of segments in pronunciation. We shall examine questions of validity and consistency, and we show how to visualize analyses using the L04 package, where the emphasis is on visualization for the purpose of exploration and understanding. Time permitting we shall turn to one or two advanced topics, e.g., explanatory models of the geographic conditioning of language variation, and/or attention to the role of linguistic structure in the geographic distribution.

This three-week course precedes Bill Kretzschmar's three-week course on the feature-based analysis of language variation. We have coordinated with Kretzschmar on the focus, deliberately focusing here on the analysis of large aggregates, while he intends to focus on analyses based on single features.

The course assumes no familiarity with dialectology or computational techniques. Some basic linguistics is helpful, as is an unintimidated attitude toward software. We have three goals:

  1. to show how aggregate analysis works, and give participants a chance to learn it
  2. to provide tools for exploring and evaluating analyses
  3. to compare this to other sorts of analyses

Students will have the opportunity to practice aggregate analysis of language variation on data from American dialects.


  1. Introduction. Concepts & History.
  2. Measuring Lexical Differences
  3. Measuring Pronunciation Differences
  4. Validating and Calibrating.
  5. Location, location, location. Why are there dialect differences?
  6. Linguistic Structure in Dialect Differences


  1. Introduction. Chambers & Trudgill (1998) Chap. 9; Nerbonne & Kretzschmar (2003).
  2. Measuring Lexical Differences. Nerbonne & Kleiweg (2003).
  3. Measuring Pronunciation Differences. Nerbonne, Heeringa & Kleiweg (1999); Heeringa (2004), Ch.5
  4. Validating and Calibrating. Heeringa, Nerbonne \& Kleiweg (2002); Heeringa (2004) Ch.7
  5. Location, location, location. Why are there dialect differences? Heeringa & Nerbonne (2002); Chambers & Trudgill (1998) Chap. 11; Nerbonne, van Gemert & Heeringa, (submitted).
  6. Linguistic Structure in Dialect Differences. Nerbonne (submitted).


See literature list.

Some Useful Links

  1. LAMSAS home page
  2. Dialectometric work on LAMSAS at the University of Groningen
  3. Peter Kleiweg's L04, a dialectometric package focusing on the use of Levenshtein distance to assay pronunciation difference. Note especially that there is a tutorial on its use.
  4. An online demo of how Levenshtein distance works.
  5. Hans Goebl's Dialectometry Project.


Are available here.

Further work?

There are two graduate student stipends available at the University of Groningen to work on dialectometry. More information here.
John Nerbonne
Last modified: Fri Oct 7 17:22:37 CEST 2005