In order to illustrate our approach, we will start with a finite state implementation of the syllabification analysis as presented in chapter 6 of [19]. This section is heavily based on [12], which the reader should consult for more explanation and examples.
The inputs to the syllabification OT are sequences of consonants and vowels. The input will be marked up with onset, nucleus, coda and unparsed brackets; where a syllable is a sequence of an optional onset, followed by a nucleus, followed by an optional coda. The input will be marked up as a sequence of such syllables, where at arbitrary places unparsed material can intervene. The assumption is that an unparsed vowel or consonant is not spelled out phonetically. Onsets, nuclei and codas are also allowed to be empty; the phonetic interpretation of such constituents is epenthesis.
First we give a number of simple abbreviations:
macro(cons, {b,c,d,f,g,h,j,k,l,m,n, p,q,r,s,t,v,w,x,y,z} ). macro(vowel, {a,e,o,u,i}).
macro(o_br, 'O['). % onset macro(n_br, 'N['). % nucleus macro(d_br, 'D['). % coda macro(x_br, 'X['). % unparsed macro(r_br, ']'). macro(bracket, {o_br,n_br,d_br,x_br,r_br}).
macro(onset, [o_br,cons^ ,r_br]). macro(nucleus, [n_br,vowel^ ,r_br]). macro(coda, [d_br,cons^ ,r_br]). macro(unparsed,[x_br,letter ,r_br]).
Following Karttunen, Gen is formalized as in fig. 3.
|
In the definitions for the constraints, we will deviate somewhat from Karttunen. In his formalization, a constraint simply describes the set of strings which do not violate that constraint. It turns out to be easier for our extension of Karttunen's formalization below, as well as for our alternative approach, if we return to the concept of a constraint as introduced by Prince and Smolensky where a constraint adds marks in the candidate string at the position where the string violates the constraint. Here we use the symbol @ to indicate a constraint violation. After checking each constraint the markers will be removed, so that markers for one constraint will not be confused with markers for the next.
macro(mark_violation(parse), replace(([] x @),x_br,[]). macro(mark_violation(no_coda), replace(([] x @),d_br,[]). macro(mark_violation(fill_nuc), replace(([] x @),[n_br,r_br],[])). macro(mark_violation(fill_ons), replace(([] x @),[o_br,r_br],[])).
macro(mark_violation(have_ons), replace(([] x @),[],n_br) o replace((@ x []),onset,[])).
The parse constraint simply states that a candidate must not contain an unparsed constituent. Thus, we add a mark after each unparsed bracket. The no_coda constraint is similar: each coda bracket will be marked. The fill_nuc constraint is only slightly more complicated: each sequence of a nucleus bracket immediately followed by a closing bracket is marked. The fill_ons constraint treats empty onsets in the same way. Finally, the have_ons constraint is somewhat more complex. The constraint requires that each nucleus is preceded by an onset. This is achieved by marking all nuclei first, and then removing those marks where in fact an onset is present.
This completes the building blocks we need for an implementation of Prince and Smolensky's analysis of syllabification. In the following sections, we present two alternative implementations which employ these building blocks. First, we discuss the approach of [12], based on the lenient composition operator. This approach uses a counting approach for multiple constraint violations. We will then present an alternative approach in which constraints eliminate candidates using matching.