next up previous
Next: The Matching Approach Up: Approximation and Exactness in Previous: Syllabification in Finite State


The Counting Approach

In the approach of [12], a candidate set is leniently composed with the set of strings which satisfy a given constraint. Since we have defined a constraint as a transducer which marks candidate strings, we need to alter the definitions somewhat, but the resulting transducers are equivalent to the transducers produced by [12]. We use the (left-associative) optimality operator oo for applying an OT constraint to a given set of candidates:5



macro(Cands oo Constraint,
          Cands
            o
  mark_violation(Constraint)
            lc
        ~ ($ @)
            o
     { @ x [], ? - @}*   ).
Here, the set of candidates is first composed with the transducer which marks constraint violations. We then leniently compose the resulting transducer with ~($ @)6, which encodes the requirement that no such marks should be contained in the string. Finally, the remaining marks (if any) are removed from the set of surviving candidates. Using the optimality operator, we can then combine Gen and the various constraints as in the following example (equivalent to figure 14 of [12]):


macro(syllabify, gen
                 oo
              have_ons
                 oo
              no_coda
                 oo
              fill_nuc
                 oo
              parse
                 oo
              fill_ons     ).

As mentioned above, a candidate string can violate a constraint multiple times and candidates that violate the constraint the least number of times need to be preferred. Lenient composition cannot distinguish two candidates if the first contains one violation, and the second contains at least two violations. For example, the above syllabify transducer will assign three outputs to the input bebop:


O[b]N[e]X[b]X[o]X[p]
O[b]N[e]O[b]N[o]X[p]
X[b]X[e]O[b]N[o]X[p]
In this case, the second output should have been preferred over the other two, because the second output violates `Parse' only once, whereas the other outputs violate `Parse' three times. Karttunen recognizes this problem and proposes to have a sequence of constraints Parse0, Parse1, Parse2 ...ParseN, where each ParseX constraint requires that candidates not contain more than X unparsed constituents.7 In this case, the resulting transducer only approximates the OT analysis, because it turns out that for any X there are candidate strings that this transducer fails to handle correctly (assuming that there is no bound on the length of candidate strings).

Our notation is somewhat different, but equivalent to the notation used by Karttunen. Instead of a sequence of constraints Cons0 ... ConsX, we will write Cands oo Prec :: Cons, which is read as: apply constraint Cons to the candidate set Cands with precision Prec, where ``precision'' means the predetermined bound on counting. For example, a variant of the syllabify constraint can be defined as:


macro(syllabify, gen
                 oo
              have_ons
                 oo
              no_coda
                 oo
              1 :: fill_nuc
                 oo
              8 :: parse
                 oo
              fill_ons     ).
Using techniques described in §5, this variant can be shown to be exact for all strings of length $\leq
10$. Note that if no precision is specified, then a precision of 0 is assumed.

This construct can be defined as follows (in the actual implementation the regular expression is computed dynamically based on the value of Prec):


macro(Cands oo 3 :: Constraint,
          Cands
             o
  mark_violation(Constraint)
            lc
  ~ ([($ @),($ @),($ @),($ @)])
            lc
    ~ ([($ @),($ @),($ @)])
            lc
      ~ ([($ @),($ @)])
            lc
        ~ ($ @)
             o
     { @ : [], ? - @}*  ).


next up previous
Next: The Matching Approach Up: Approximation and Exactness in Previous: Syllabification in Finite State

2000-06-29