The course will assume basic notions in statistics, including, say the notions sampling, statistical significance, confidence intervals, effect size, z-tests, t-tests, and χ2 tests. But we will not build very directly on these notions so that a general scientific maturity should also be sufficient.


Linguistics is rapidly becoming a statistical field in very many of its subdisciplines, something virtually no one would have predicted fifteen years ago. This advanced LOT course will review recent developments which point in this direction and present their statistical underpinnings.

To avoid misunderstanding, let me emphasize that the course will adopt the perspective of general linguistics and proceed to ask what quantitative techniques have to offer. We will not attempt to introduce the specialized topics of the subdiscipline Quantitative Linguistics. e.g. the nature of word frequency distributions.

We have three goals:

  1. to examine the reasons why linguistics was non-statistical for so long (problems in corpora analysis, usefulness of intuitive judgements), and why these reasons now seem less compelling.
  2. to examine some leading papers advocating or exploiting "the statistical turn"
  3. to present critically some selected areas, including the statistical underpinnings needed to understand the work mentioned above

(Very Tentative) Schedule

  1. Introduction. Concepts & history.
  2. Measurement (consistency, validity)
  3. Mixed effects models, esp. in syntax and phonology
  4. Dimension reduction, esp. in variationist studies.
  5. Permutation tests, contributions from participants?

Some Useful Literature

  1. Steven Abney (1996). Statistical Methods and Linguistics. In: Judith Klavans and Philip Resnik (eds.), The Balancing Act. The MIT Press, Cambridge, MA. 61 KB, 23+1 pages
  2. Fernando Pereira. (2000) Formal grammar and information theory: Together again?. Philosophical Transactions of the Royal Society, 358(1769):1239-1253, April.

