Quantitative Linguistics: Some Statistics
Prerequisites
The course will assume basic notions in statistics,
including, say the notions sampling, statistical
significance, confidence intervals, effect size,
ztests, ttests, and χ^{2} tests.
But we will not build very directly on these notions
so that a general scientific maturity should also be
sufficient.
Description
Linguistics is rapidly becoming a statistical field in very
many of its subdisciplines, something virtually no one would
have predicted fifteen years ago. This advanced LOT course
will review recent developments which point in this direction
and present their statistical underpinnings.
To avoid misunderstanding, let me emphasize that the course
will adopt the perspective of general linguistics and
proceed to ask what quantitative techniques have to offer.
We will not attempt to introduce the specialized topics of
the subdiscipline Quantitative Linguistics. e.g. the
nature of word frequency distributions.
We have three goals:

to examine the reasons why linguistics was
nonstatistical for so long (problems in corpora
analysis, usefulness of intuitive judgements), and why
these reasons now seem less compelling.

to examine some leading papers advocating or exploiting
"the statistical turn"

to present critically some selected areas, including the
statistical underpinnings needed to understand the work
mentioned above
(Very Tentative) Schedule

Introduction. Concepts & history.

Measurement (consistency, validity)

Mixed effects models, esp. in syntax and phonology

Dimension reduction, esp. in variationist studies.

Permutation tests, contributions from participants?
Some Useful Literature

Steven Abney (1996). Statistical Methods
and Linguistics. In: Judith Klavans and Philip Resnik
(eds.), The Balancing Act. The MIT Press,
Cambridge, MA. 61 KB, 23+1 pages

Fernando Pereira. (2000) Formal
grammar and information theory: Together again?.
Philosophical Transactions of the Royal Society,
358(1769):12391253, April.
John Nerbonne
