Computational Dialectometry
Instructor: John Nerbonne (course under development)
Course Number:
Seminar room:
Mon. & Wed. 16:15-18, Oct.31 - Nov.9 and Nov.28 - Dec.7
Announcements 2005
- For those of you who don't read the New Yorker, I've excerpted
a recent issue with a brief story on
Bill Labov. Enjoy!
- This course overlaps in content a great deal with the course I
taught at the 2005 Linguistics
Institute (Harvard/MIT). See Aggregate Dialectal
Variation. People who took that course will see little new here
(but see section on detectic syntactic variation).
- On Mon. Nov. 7 the course will not meet. But there will be an
SFB-wide colloquium on this topic. Details later.
Description
This course will focus on what Hans Goebl has called the "linguistic
management of space," how language variation is structured
geographically. We shall begin with basic ideas from categorical data
analysis which we'll apply to lexical and syntactic data, then examine
a technique from computational linguistics (edit distance or
Levenshtein distance) for the analysis of sequences of segments in
pronunciation. We shall examine questions of validity and consistency,
and we show how to visualize analyses using the L04 package, where the
emphasis is on visualization for the purpose of exploration and
understanding. Time permitting we shall turn to one or two advanced
topics, e.g., explanatory models of the geographic conditioning of
language variation, attention to the role of linguistic
structure in the geographic distribution, or techniques for analyzing
syntactic variation from text corpora.
The course assumes no familiarity with dialectology or computational
techniques. Some basic linguistics is helpful, as is an unintimidated
attitude toward software. We have three goals:
- to show how aggregate analysis works, and to get participants
to understand it critically
- to compare this to other sorts of analyses
Students are encouraged to try aggregate analysis of language
variation on data from American dialects using Peter Kleiweg's L04 package. This
presupposes familiarity with UNIX. Note in particular that there is a
tutorial on its use.
Schedule
Overhead sheets are available.
- Introduction. Concepts & History.
- Measuring Lexical Differences
- Measuring Pronunciation Differences
- Validating and Calibrating.
- Location, location, location. Why are there dialect differences?
- Linguistic Structure in Dialect Differences
- Detecting syntactic variation in corpora (tentative).
Readings.
- Introduction. Chambers & Trudgill (1998) Chap. 9; Nerbonne & Kretzschmar (2003).
- Measuring Lexical Differences. Nerbonne & Kleiweg (2003).
- Measuring Pronunciation Differences. Nerbonne, Heeringa & Kleiweg (1999); Heeringa (2004), Ch.5
- Validating and Calibrating. Heeringa, Nerbonne \& Kleiweg (2002); Heeringa (2004) Ch.7
- Location, location, location. Why are there dialect differences?
Heeringa & Nerbonne
(2002); Chambers & Trudgill (1998) Chap. 11; Nerbonne, van Gemert & Heeringa, (submitted).
- Linguistic Structure in Dialect Differences. Nerbonne (submitted).
Literature
See literature list.
Some Useful Links
- LAMSAS home page
- Dialectometric
work on LAMSAS at the University of Groningen
- Peter Kleiweg's L04, a dialectometric
package focusing on the use of Levenshtein distance to assay
pronunciation difference. Note especially that there is a
tutorial on its use.
- An online demo
of how Levenshtein distance works.
- Hans Goebl's Dialectometry
Project.
Exercises
Are available here.
John Nerbonne
Last modified: Wed Nov 16 21:51:28 CET 2005