Workshop on Computing and Phonology

A small workshop on computational aspects of phonology is held at the University of Groningen (RUG), the Netherlands, on December 8, 2006. The workshop is open to anyone, but we kindly ask you to register not later than December 4. Should you have any question, please feel free to contact Tamás Bíró at birot @

Harmony Building, H13.309 (Multimediazaal)
Oude Kijk in't Jatstraat 26, 9712 EK Groningen.

View all abstracts


Chair: Dicky Gilbers
9:30Opening: John Nerbonne
9:40Tamas Biro (ACLC, Universiteit van Amsterdam):
Simulated Annealing for Optimality Theory: A performance model for phonology
Similarly to other fields in linguistics in the last forty years, phonological models have focused on linguistic competence, whereas performance has not been considered as belonging to the realm of linguistics. The traditional Chomskyan dichotomy between competence and performance has, however, been questioned in the last decade by an increasing number of scholars. Certain performance phenomena, such as variation, conditional corpus frequencies and gradient grammaticality judgments, have been shown in many cases to be related to factors that unquestionably belong to linguistics. Models accounting for these phenomena have led to an ongoing discussion on whether and how to draw the borderline between competence and performance, or between the realm of linguistics and extra-linguistic factors.
I shall present the Simulated Annealing for Optimality Theory Algorithm (SA-OT) as a possible compromise. The main idea is to replace the Chomskyan dichotomy with a three-level structure: the static knowledge of the language in the brain, the computation performed by the brain, and the extra-linguistic level. While a traditional OT-grammar is a model for the static knowledge of the language, its implementation -- such as SA-OT -- models the first part of the language production process. By being related to the linguistic model, but also prone to make errors under different conditions (such as time constraints), it is claimed to be an adequate model for certain, linguistically motivated performance phenomena.

Close abstract

10:20Bart Cramer and John Nerbonne (CLCG, Rijksuniversiteit Groningen):
Scaling Minimal Generalization
In this study, we model the phonotactics using minimal generalization, a stochastic rule-based system proposed by Albright and Hayes (2003), who used this system successfully on learning the past tense in English. Their system generates rules that try to generalize over the phonetic features of the input (in our case, the CELEX database). These rules are hypotheses which might prove wrong in other parts of the input; hence they are 'stochastic'. This algorithm maintains the explicitness of rule-based systems, but adds an element of stochastic comparison. The results from Albright and Hayes also suggest that the model captures some aspects of cognitive representation faithfully.
However, when we apply this methodology to the problem of phonotactics, it does not immediately generalise well. It accepted well-formed examples well, but was ill-equipped to reject strings as ill formed. We therefore propose improvements to the original algorithm, first, to force it to greater discrimination, and second, to take into account implicit negative information as well. The improved algorithm reduces the number of rules by a factor 5, and thus improves the transparency of the output. It also cuts the number of errors (both false positives and false negatives) in half compared to the original algorithm.
Albright, Adam and Bruce Hayes (2003) "Rules vs. Analogy in English Past Tenses: A Computational/Experimental Study" in: Cognition 90, 2003, pp. 119-161

Close abstract


Chair: Petra Hendriks
11:30Gerhard Jäger (Universität Bielefeld):
Exemplar dynamics and George Price's General Theory of Selection
In a paper from the early seventies -- that was only published posthumously in 1995 -- the mathematical geneticist George Price laid out the foundations for a program that he called "a general theory of selection". His aim was a mathematical framework which can serve to describe all kinds of evolutionary processes, from gene selection in biology to political processes in human societies. The evolution of grammars was explicitly mentioned as one of the potential applications.
In the talk I will describe Price's program, and I will give a sketch how it can be applied to linguistics. I will concentrate on the exemplar dynamics of language processing that has recently gained a lot of attention (see the work of Bybee, Pierrehumbert, Wedel, and the papers by Bod, Bresnan and others in the recent special issue of The Linguistic Review). I will argue that it should properly be understood as an evolutionary process (as eloquently pointed out by Andrew Wedel), and that Price's formula is a perfect analytical tool to understand this dynamics.

Close abstract

12:10Paul Boersma (ACLC, Universiteit van Amsterdam):
The emergence of markedness
In a parallel Optimality-Theoretic model with multiple levels (phonetic form, phonological form, underlying form), the gradual acquisition of comprehension leads automatically to a ranking of faithfulness constraints in comprehension according to cue reliability and frequency of occurrence. If the speaker uses the same faithfulness constraint ranking in production, this leads to a correlation between phonological activity on the one hand and cue reliability and frequency on the other. Markedness, therefore, emerges as a result of an acquisition bias. Phonological theory therefore needs neither innate markedness hierarchies, nor synchronically functionalist (i.e. teleological) principles.

Close abstract


Chair: Gosse Bouma
14:30Adam Albright (MIT, Cambridge, MA):
Modeling gradient phonotactic well-formedness as grammatical competence
A commonly stated goal of phonological analysis is to explain what speakers know that lets them agree that some non-occurring strings are possible words, while others are not (Halle 1962). Whenever one gathers judgments about novel words, however, a challenge arises: words fall along a gradient scale of acceptability: *bnick, *dlip < ?bwip < blick. Often, analysts impose a threshold, and formulate a grammar generating anything above the cut-off; further distinctions are assumed to reflect extra-grammatical factors like frequency, analogy, etc. In this talk, I defend the position that gradience is best modeled within the grammar itself. I consider three dimensions along which models may differ: (1) the structure used to encode generalizations, (2) the way frequency influences generalization, and (3) access to prior markedness biases. I present computational models that differ along these dimensions, and report attempts to model experimental acceptability judgments. The results so far indicate that a successful model must refer to sequences of natural classes, rather than raw perceptual similarity. Furthermore, the strength of a pattern is found to correlate with type frequency, not token frequency, contrary to what one would expect if gradience arose "on-line" during lexical access. Finally, preferences can be observed that have no apparent basis in the lexicon. Taken together, these facts suggest that gradience is indeed encoded within a learned grammar, composed partly of lexical generalizations and partly of phonetic markedness biases.

Close abstract

15:30Closing and coffee


If you intend to participate in the workshop, please register before December 4, 2006 in order to facilitate organisation.

Further information:

Information Science/Humanities Computing
Center for Language and Cognition Groningen (CLCG)
Rijksuniversiteit Groningen (RUG)

From Wilbert Heeringa's page:
A list of hotels in Groningen (please note that the prices are outdated).
Travel information

Thanks to Gerlof Bouma for the design.