Computational Dialectometry

Instructor: John Nerbonne (course under development)
Course Number:
Seminar room:
Mon. & Wed. 16:15-18, Oct.31 - Nov.9 and Nov.28 - Dec.7

Announcements 2005


This course will focus on what Hans Goebl has called the "linguistic management of space," how language variation is structured geographically. We shall begin with basic ideas from categorical data analysis which we'll apply to lexical and syntactic data, then examine a technique from computational linguistics (edit distance or Levenshtein distance) for the analysis of sequences of segments in pronunciation. We shall examine questions of validity and consistency, and we show how to visualize analyses using the L04 package, where the emphasis is on visualization for the purpose of exploration and understanding. Time permitting we shall turn to one or two advanced topics, e.g., explanatory models of the geographic conditioning of language variation, attention to the role of linguistic structure in the geographic distribution, or techniques for analyzing syntactic variation from text corpora.

The course assumes no familiarity with dialectology or computational techniques. Some basic linguistics is helpful, as is an unintimidated attitude toward software. We have three goals:

  1. to show how aggregate analysis works, and to get participants to understand it critically
  2. to compare this to other sorts of analyses

Students are encouraged to try aggregate analysis of language variation on data from American dialects using Peter Kleiweg's L04 package. This presupposes familiarity with UNIX. Note in particular that there is a tutorial on its use.


Overhead sheets are available.
  1. Introduction. Concepts & History.
  2. Measuring Lexical Differences
  3. Measuring Pronunciation Differences
  4. Validating and Calibrating.
  5. Location, location, location. Why are there dialect differences?
  6. Linguistic Structure in Dialect Differences
  7. Detecting syntactic variation in corpora (tentative).


  1. Introduction. Chambers & Trudgill (1998) Chap. 9; Nerbonne & Kretzschmar (2003).
  2. Measuring Lexical Differences. Nerbonne & Kleiweg (2003).
  3. Measuring Pronunciation Differences. Nerbonne, Heeringa & Kleiweg (1999); Heeringa (2004), Ch.5
  4. Validating and Calibrating. Heeringa, Nerbonne \& Kleiweg (2002); Heeringa (2004) Ch.7
  5. Location, location, location. Why are there dialect differences? Heeringa & Nerbonne (2002); Chambers & Trudgill (1998) Chap. 11; Nerbonne, van Gemert & Heeringa, (submitted).
  6. Linguistic Structure in Dialect Differences. Nerbonne (submitted).


See literature list.

Some Useful Links

  1. LAMSAS home page
  2. Dialectometric work on LAMSAS at the University of Groningen
  3. Peter Kleiweg's L04, a dialectometric package focusing on the use of Levenshtein distance to assay pronunciation difference. Note especially that there is a tutorial on its use.
  4. An online demo of how Levenshtein distance works.
  5. Hans Goebl's Dialectometry Project.


Are available here.
John Nerbonne
Last modified: Wed Nov 16 21:51:28 CET 2005