next up previous
Next: Implementational Issues Up: Regular Expression Operators in Previous: The replace operator.

Leftmost-longest contexted replacement.

In [7] a leftmost-longest match contexted replacement operator

\mbox{\tt lml(T,Left,Right)}

is defined which ensures that the transducer T is applied in contexts Left and Right, using a leftmost-longest match strategy. One application of such an operator is finite-state parsing (chunking), [1,5,8,22]. In finite-state parsing, sets of context-free rules are collected into levels. Typically there is a finite number of such levels, and these levels are ordered. First each of the rules of the first level apply. The result is then input to the second level, etc. Note that rules cannot work on their own output, unless the same rule is placed in several levels.

In the following example we will not use the contexts; therefore lml/1 is defined as:

\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro( lml(T), lml(T,[],[]) )\end{verbatim}\end{minipage}\end{displaymath} (25)

This operator ensures that the transducer T is applied to a string at all possible positions, using a left-to-right left-most longest match policy.

In this particular example we will assume that the input to the finite-state parser is a tagged sentence: each word is represented by a category, an opening bracket, the word itself, and a closing bracket. A rule with a given left hand side and right hand side will look for the sequence of elements described by the right hand side and wrap the result inside left hand side brackets. In general, the macro --> can be defined as the following transducer (this macro disallows the case that the daughters is the empty string):

\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro((A --> Ds), [[] x [A,'['], Ds-[], []:']'])\end{verbatim}\end{minipage}\end{displaymath} (26)

We use the macro d(Expr) for elements in the right hand side of rules; the macro dw(Expr) is similar but is used for pre-terminals, to refer to specific words.

...rd),[Cat, '[':'(', Word, ']':')']).\end{verbatim}\end{minipage}\end{displaymath} (27)

Note that the brackets (introduced by an earlier level) are replaced here by other brackets in order to ensure that these brackets cannot be used in later levels again; in other words at any given level we can only `see' the top-most constituents (yet, the full parse tree can be recoved using the `invisible' brackets).

Using these two macro's a rule to recognize basic noun phrases is:

\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}np --> [d(art)^,d(num)^,d(adj)*,d(n)+]\end{verbatim}\end{minipage}\end{displaymath} (28)

A level of rules can now simply be defined as the replacement operator applied to the union of these rules. For instance, the following is a level recognizing multi-word-units (for instance, the Dutch phrase `ten opzichte van' is comparable to the English phrase `with respect to'):
...p,in),dw(n,plaats),dw(p,van)] ) }))\end{verbatim}\end{minipage}\end{displaymath} (29)

Finally, we use composition to combine a number of such levels. Thus, the following expression defines a simple noun-phrase chunker:

o lml(( np --> [d(np),d(pp)+])))\end{verbatim}\end{minipage}\end{displaymath} (30)

For example, one of the sentences from the Eindhoven corpus [6] is chunked as in figure 3.

Figure 3: Application of NP chunker
\par\centerline {\pstree[levelsep=1.2cm,nodesep=2pt,treesep=0.4cm...

next up previous
Next: Implementational Issues Up: Regular Expression Operators in Previous: The replace operator.