Syllabification in Finite State OT

Next: The Counting Approach Up: Finite State Phonology Previous: Finite State Optimality Theory

Syllabification in Finite State OT

In order to illustrate our approach, we will start with a finite state implementation of the syllabification analysis as presented in chapter 6 of [19]. This section is heavily based on [12], which the reader should consult for more explanation and examples.

The inputs to the syllabification OT are sequences of consonants and vowels. The input will be marked up with onset, nucleus, coda and unparsed brackets; where a syllable is a sequence of an optional onset, followed by a nucleus, followed by an optional coda. The input will be marked up as a sequence of such syllables, where at arbitrary places unparsed material can intervene. The assumption is that an unparsed vowel or consonant is not spelled out phonetically. Onsets, nuclei and codas are also allowed to be empty; the phonetic interpretation of such constituents is epenthesis.

First we give a number of simple abbreviations:


macro(cons,      
      {b,c,d,f,g,h,j,k,l,m,n,
       p,q,r,s,t,v,w,x,y,z}  ).
macro(vowel,   {a,e,o,u,i}).


macro(o_br,    'O['). % onset
macro(n_br,    'N['). % nucleus
macro(d_br,    'D['). % coda
macro(x_br,    'X['). % unparsed
macro(r_br,    ']').
macro(bracket,   
      {o_br,n_br,d_br,x_br,r_br}).


macro(onset,   [o_br,cons^  ,r_br]).
macro(nucleus, [n_br,vowel^ ,r_br]).
macro(coda,    [d_br,cons^  ,r_br]).
macro(unparsed,[x_br,letter ,r_br]).

Following Karttunen, Gen is formalized as in fig. 3.

Figure 3: The definition of Gen


macro(gen,       {cons,vowel}* 
                       o 
                   overparse 
                       o 
                     parse 
                       o 
                syllable_structure ).

macro(parse, replace([[] x {o_br,d_br,x_br},cons, [] x r_br])
                                 o
             replace([[] x {n_br,x_br},     vowel,[] x r_br])).

macro(overparse,intro_each_pos([{o_br,d_br,n_br},r_br]^)).

macro(intro_each_pos(E), [[ [] x E, ?]*,[] x E]).

macro(syllable_structure,ignore([onset^,nucleus,coda^],unparsed)*).

Here, parse introduces onset, coda or unparsed brackets around each consonant, and nucleus or unparsed brackets around each vowel. The replace(T,Left,Right) transducer applies transducer T obligatory within the contexts specified by Left and Right [4]. The replace(T) transducer is an abbreviation for replace(T,[],[]), i.e. T is applied everywhere. The overparse transducer introduces optional `empty' constituents in the input, using the intro_each_pos operator.⁴

In the definitions for the constraints, we will deviate somewhat from Karttunen. In his formalization, a constraint simply describes the set of strings which do not violate that constraint. It turns out to be easier for our extension of Karttunen's formalization below, as well as for our alternative approach, if we return to the concept of a constraint as introduced by Prince and Smolensky where a constraint adds marks in the candidate string at the position where the string violates the constraint. Here we use the symbol @ to indicate a constraint violation. After checking each constraint the markers will be removed, so that markers for one constraint will not be confused with markers for the next.


macro(mark_violation(parse),
     replace(([] x @),x_br,[]).

macro(mark_violation(no_coda),
     replace(([] x @),d_br,[]).

macro(mark_violation(fill_nuc),
     replace(([] x @),[n_br,r_br],[])).

macro(mark_violation(fill_ons),
     replace(([] x @),[o_br,r_br],[])).


macro(mark_violation(have_ons),
     replace(([] x @),[],n_br)
                o
     replace((@ x []),onset,[])).

The parse constraint simply states that a candidate must not contain an unparsed constituent. Thus, we add a mark after each unparsed bracket. The no_coda constraint is similar: each coda bracket will be marked. The fill_nuc constraint is only slightly more complicated: each sequence of a nucleus bracket immediately followed by a closing bracket is marked. The fill_ons constraint treats empty onsets in the same way. Finally, the have_ons constraint is somewhat more complex. The constraint requires that each nucleus is preceded by an onset. This is achieved by marking all nuclei first, and then removing those marks where in fact an onset is present.

This completes the building blocks we need for an implementation of Prince and Smolensky's analysis of syllabification. In the following sections, we present two alternative implementations which employ these building blocks. First, we discuss the approach of [12], based on the lenient composition operator. This approach uses a counting approach for multiple constraint violations. We will then present an alternative approach in which constraints eliminate candidates using matching.

Next: The Counting Approach Up: Finite State Phonology Previous: Finite State Optimality Theory

2000-06-29