Longest Match Capturing

As discussed in §1 the POSIX standard requires that multiple
captures follow a longest match strategy. For multiple captures as in
(3), one establishes first a longest match for
*domain*(*T*_{1})^{ . }...^{ . }*domain*(*T*_{n}). Then we
ensure that each of
*domain*(*T*_{i}) in turn is
required to match as long as possible, with each one having priority
over its rightward neighbors. To implement this, we define a macro
`lm_concat(Ts)` and use it as:

replace(lm_concat(Ts),Left,Right)

Ensuring the longest overall match is delegated to the `replace`
macro, so `lm_concat(Ts)` needs only ensure that each individual
transducer within `Ts` gets its proper left-to-right longest
matching priority. This problem is mostly solved by the same
techniques used to ensure the longest match within the `replace`
macro. The only complication here is that `Ts` can be of
unbounded length. So it is not possible to have a single expression in
the finite state calculus that applies to all possible lenghts. This
means that we need something a little more powerful than mere macro
expansion to construct the proper finite state calculus expression.
The FSA Utilities provides a Prolog hook for this purpose. The
resulting definition of `lm_concat` is given in
figure 2.

Suppose (as in [3]), we want to match the following
list of recognizers against the string `topological` and insert a
marker in each boundary position. This reduces to applying:

lm_concat([ [{[t,o],[t,o,p]},[]: '#'], [{o,[p,o,l,o]},[]: '#'], {[g,i,c,a,l],[o^,l,o,g,i,c,a,l]} ])This expression transduces the string

1999-04-15