Determinization of Transducers

Next: Determinization and identities Up: Transducers with Predicates and Previous: Composition

Determinization of Transducers

We will call a pfst M deterministic if M has a single start state, if there are no states $p, q\in Q$ such that $(p,\epsilon,\psi,q,i)\in E$ , and if for all states p and symbols $\sigma$ there is at most one transition $(p,\pi_d,\psi,q,i)$ such that $\pi_d(\sigma)$ . The transduction of an input string by means of a deterministic pfst is simple: in going through the input from left to right, you know exactly in which state you are (so there is no backtracking; alternatively if a parallel implementation is considered, there is no need to maintain a number of states linear in the size of the transducer). If a pfst M is deterministic then computing the transductions of a given string w as defined by M can be implemented efficiently. This computation is linear in w, and independent on the size of M. Since w can have several transductions (unless M is functional), we assume that this computation constructs a pfsr accepting $\{w'\vert(w,w')\in R(M)\}$ .⁶

In order to extend the determinization algorithm for transducers [28,30,26,31,32], we must extend pfst in such a way that the output part of a transition is a sequence of predicates. This extension is described later, but first we illustrate some of the complications that arise. For the moment we will simply assume that the output part of a transition contains a sequence of predicates. We first create an equivalent pfst which has no $\epsilon$ on the domain part of transitions, using the same technique as described in [32, page 29].⁷ In the determinization algorithm, local ambiguities such as those encountered in state 0 in (here, A...F are arbitrary predicates $\in\Pi$ ):

$\begin{pspicture}(-2,-0.5)(9,2.5) \psset{unit=0.02cm}\rput(0,60.0){\circlenode[]... ...15.0]{->}{4}{2} \nbput{\rnode{x}{\footnotesize {\sc c}:{\sc d}}} \end{pspicture}$

are solved by delaying the outputs as far as needed, until these symbols can be written out deterministically:⁸

$\begin{pspicture}(-1,-0.5)(10,3.5) \psset{unit=0.02cm}\rput(0,60.0){\circlenode[... ...{<-}{3}{0.35cm} \trput{\rnode{x}{\footnotesize {\sc a}:{\sc a}}} \end{pspicture}$

The determinization algorithm for transducers maintains sets of pairs $Q\times\Pi^*$ . Such a set corresponds to a state in the determinized transducer. In order to compute the transitions leaving such a set of pairs P, we compute for each $\pi$ , $\mbox{\it Trans}^P(\pi)=\{(q,\phi\psi)\vert(p,\phi)\in P,(p,\pi,\psi,q)\in E\}$ . In the example, we can be in states 3 and 4 after reading a symbol compatible with A, with pending outputs E and F. We thus have $P=\{(3,\mbox{\sc e}),(4,\mbox{\sc f})\}$ . Therefore, we have:

$\begin{displaymath}\mbox{\it Trans}^P(\mbox{\sc b}) = \{(2,\mbox{\sc ed})\}, \mbox{\it Trans}^P(\mbox{\sc c}) = \{(2,\mbox{\sc fd})\} \end{displaymath}$

Let $\Pi'$ be the predicates in the domain of Trans^P. For each split of $\Pi'$ into $\pi_1\dots \pi_i$ and $\pi_{i+1}\dots\pi_n$ we have a proto-transition:

$\begin{displaymath}(P,\pi_1\wedge\dots\wedge\pi_i\wedge\neg\pi_{i+1}\wedge\dots\... ...box{\it Trans}^P(\pi_1)\cup\dots\cup\mbox{\it Trans}^P(\pi_i)) \end{displaymath}$

In the example, we have the following proto-transitions (we need not represent the $\emptyset$ state):

$\begin{displaymath} \begin{array}{ccccc} (&P,& \mbox{\sc b}\wedge\neg \mbox{\sc ... ...c}, &\{(2,\mbox{\sc ed}),(2,\mbox{\sc fd})\} &)\\ \end{array}\end{displaymath}$

A transition is created from a proto-transition by removing the longest common prefix of predicates in the target pairs; this prefix is the sequence of output predicates of the resulting transition. However, before we remove this longest common prefix, we first consider possible simplifications in the sequences of output predicates, by packing multiple sequences associated with the same target state into a smaller number of sequences (using disjunction). In particular, two pairs of target state and predicate sequences $(p_1,\psi_1)$ and $(p_2,\psi_2)$ can be combined into a single pair $(p,\psi)$ iff p₁=p₂=p and $\psi_1=\pi_1\dots\pi_i\dots\pi_n$ , $\psi_2=\pi_1\dots\pi_i'\dots\pi_n$ and $\psi=\pi_1\dots\pi_i\vee\pi_i'\dots\pi_n$ . In a proto-transition this simplification is applied repeatedly until no further simplifications are possible.

Here, the third proto-transition is simplified into:

$\begin{displaymath} \begin{array}{ccccc} (&P,& \mbox{\sc b}\wedge \mbox{\sc c}, ... ...\mbox{\sc e}\vee \mbox{\sc f})\mbox{\sc d})\} &)\\ \end{array}\end{displaymath}$

Moving the longest common prefix into the output part of the label yields:

$\begin{displaymath} \begin{array}{ccccc} (&P,& \mbox{\sc b}\wedge\neg \mbox{\sc ... ... \mbox{\sc f})\mbox{\sc d}, &\{(2,\epsilon)\} &)\\ \end{array}\end{displaymath}$

The introduction of predicates thus has the interesting effect that certain non-functional transducers can be treated by the transducer determinization algorithm. Assume that B is the predicate $\{\mbox{\tt x,y}\}$ , C is the predicate $\{\mbox{\tt y,z}\}$ and the predicates A, D, E and F are true only of the symbols a, d, e and f respectively. The equivalent normal transducer is:

$\begin{pspicture}(-2,-1)(9,4.5) \psset{unit=0.02cm}\rput(0,90.0){\circlenode[]{0... ...15.0]{->}{4}{2} \naput{\rnode{x}{\footnotesize {\tt y}:{\tt d}}} \end{pspicture}$

This transducer cannot be treated by the transducer determinization algorithm (that algorithm does allow a limited form of ambiguity, but only if this ambiguity can be delayed to a final state; here this is not possible). However, the same transduction can be determinized if expressed by a pfst:

$\begin{pspicture}(-1,-0.5)(10,3.5) \psset{unit=0.02cm}\rput(0,60.0){\circlenode[... ...{<-}{3}{0.35cm} \trput{\rnode{x}{\footnotesize {\tt a}:{\tt a}}} \end{pspicture}$

If predicates are used, then a larger class of transductions can be implemented efficiently. A precise classification of this class is beyond the scope of this paper, but note that the type of ambiguities that can be implemented in this way is limited to ambiguities that extend only over a single symbol.⁹ For instance, a simple example such as the following cannot be determinized:

$\begin{pspicture}(-3.5,-1)(7.5,3.0) \psset{unit=0.015cm} \rput(0,60.0){\circleno... ...rc[arcangle=-15.0]{->}{3}{1} \nbput{\rnode{x}{$\epsilon$:\tt e}} \end{pspicture}$

Next: Determinization and identities Up: Transducers with Predicates and Previous: Composition

Noord G.J.M. van
2001-06-22