Identities

Next: Smaller Automata Up: Introduction Previous: Notation

Identities

Consider the following phonological rule (from [15]) in which an underlying nasal segment N is mapped either to an m (if followed by a p) or an n:

$\begin{displaymath} \mbox{\tt N} \rightarrow \mbox{\tt m} / \_\!\_\!\_~~ \mbox{{\tt p}; ~~elsewhere {\tt n}} \end{displaymath}$

A transducer implementing this phonological rule can be illustrated as follows:

$\begin{pspicture}(-3,-1.5)(8,4.5) \psset{unit=0.02cm} \rput(0,0){\circlenode[dou... ...m}} \par\ncarc[arcangle=15.0]{->}{2}{0} \naput{\rnode{x}{\tt p}} \end{pspicture}$

This transducer contains a single start state 0, and two final states, 0 and 1. First consider the cyclic transition on state 0, labeled by the predicate $\overline{\mbox{\tt N}}$ , i.e., the predicate which is true of all symbols except the symbol N. As long as the transducer does not read this special nasal segment N, it remains in state 0 and simply copies its input. Upon reading an N, the transducer non-deterministically moves to state 1 or state 2, writing out an n or m respectively. In the first case, the next input symbol cannot be a p; in the second case the next input symbol must be a p.

Note that the transition from state 2 to state 0 simply contains a p. The idea here is that if the input and output symbol must be identical, only a single predicate is written for that transition. The same abbreviation is used for the transition from 1 to 0, as well as over the looping transition from 0 to 0 with predicate $\overline{\mbox{\tt N}}$ . The intention here of course is that every incoming segment which is not equal to N should be mapped to itself in the output. However, note that this is quite different from the pair of predicates $\overline{\mbox{\tt N}}: \overline{\mbox{\tt N}}$ . The latter would map an incoming symbol to an arbitrary output symbol, as long as both symbols are unequal to N.

The example illustrates an important point: if predicates are introduced in transducers, then for typical examples we must also be able to express the identity of input and output of a transition. In this example, if there were no way to express the identity between input and output, then we would be forced to have multiple transitions such as a:a, b:b, c:c, d:d for all of the relevant symbols; the introduction of predicates can be exploited in transducers only if identity between input and output can be expressed as well.

Expressing identity between input and output is crucial. This notion of identity can be seen as a consequence of the linguistic principle of Faithfulness: corresponding input and output segments tend to be identical. A similar argument is expressed in [11]. Indeed, many interesting transducers are of the type `change all occurrences of $\alpha$ in some specific context into $\beta$ , and pass on the rest of the input unaltered'. The various replacement and `local extension' operators all produce transducers of this kind [16,31,17,20,9]. Identities can be seen as a limited case of backreferencing. Backreferencing is an extension of regular expressions widely used in editors, scripting languages and other tools. A limited version of finite-state calculus backreferences is discussed in [10].

Next: Smaller Automata Up: Introduction Previous: Notation

Noord G.J.M. van
2001-06-22