next up previous
Next: Direct implementation of new Up: Extendible Regular Expression Operators Previous: Extendible Regular Expression Operators

New operators in terms of existing operators.

A regular expression operator is defined as a pair macro(ExprA,ExprB) which indicates that the regular expression ExprA is to be interpreted as regular expression ExprB. For example, simple nullary regular expression operators (equivalent to abbreviatory devices found in tools such as lex and flex), can be defined as in the following example:
\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro( vowel, {a,e,i,o,u} )\end{verbatim}\end{minipage}\end{displaymath} (5)

indicating that the operator vowel/0 can be understood by assuming that every occurrence of vowel in a regular expression is textually replaced by {a,e,i,o,u}.

The same mechanism is used to define n-ary operators, exploiting Prolog variables. For instance, the containment operator containment(Expr) is the set of all strings which have as a sub-string any of the strings in Expr. This could be defined as follows:2

\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro(containment(Expr), [? *,Expr,? *])\end{verbatim}\end{minipage}\end{displaymath} (6)

Naturally, operators defined in this way can be part of the definition of other operators. For instance, the operator free(A) is the language of all strings which do not have any of the strings in A as a substring. This can be defined as:
\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro(free(A), ~containment(A))\end{verbatim}\end{minipage}\end{displaymath} (7)

We have found it useful to define boolean operators using this mechanism. In fact, if we use the universal language to stand for true and the empty language to stand for false, then the standard operators for intersection and union correspond to conjunction and disjunction:

\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro(true,? *).
macro(false,{}).\end{verbatim}\end{minipage}\end{displaymath} (8)

With these definitions we get the expected properties:
\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}true & tru...
...false = false {false,false} = false\end{verbatim}\end{minipage}\end{displaymath} (9)

The macros for true and false can also be used to define a conditional expression in the calculus. The operator coerce_to_boolean maps the empty language to the empty language, and any non-empty language to the universal language:

...~coerce_to_boolean(Cond) o Else }).\end{verbatim}\end{minipage}\end{displaymath} (10)

Various interesting properties of automata have been implemented which yield boolean values, such as the predicates is_equivalent/2 for recognizers, and is_functional/1 and is_subsequential/1 for transducers (using the algorithms described in for instance [23]).

Regular expression operator definitions can also be recursive. The following example demonstrates furthermore that definitions can take the operands of the operator into account. The operator set(List) yields the union of the languages given by each of the expressions in the list List; union(A,B) is a built-in operator providing the union of the two languages A and B:

...o(set([H\vert T]),union(H,set(T))).\end{verbatim}\end{minipage}\end{displaymath} (11)

We can also exploit the fact that these definitions are directly interpreted in Prolog by providing Prolog constraints on such rules. This possibility is used in [7] to define a longest-match concatenation operator which implements the leftmost-longest capture semantics required by the POSIX standard (cf. section 4.4).

A simple example is a generalization of the operator free. Suppose we want to define an operator free(N,Expr) indicating the set of strings which do not contain more than N occurrences of Expr. This can be done as follows:

...0 > 0, N is N0-1, free_list(N,X,T).\end{verbatim}\end{minipage}\end{displaymath} (12)

Another example is an implementation of the N-queens problem: how to place N queens on an N by N chess-board in such a way that no queen attacks any other queen. For any N we can create a regular expression generating exactly all strings of solutions. A solution to the N-queen problem is represented as a string of N integers between 1 and N. An integer i at position j in this string indicates that a queen is placed on the i-th column of the j-th row.

& reverse(diagonals(N)))\end{verbatim}\end{minipage}\end{displaymath} (13)

The operator n_queens(N) is defined as the intersection of a number of constraints. The first constraint, sigma(N)*, indicates that a solution must be a string of integers between 1 and N. The second constraint indicates that the length of the string must be N. The remaining constraints ensure that queens do not attack each other. The definition of length illustrates once more the use of Prolog to create a regular expression; the definition of sigma/1 uses the set operator defined previously.
\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro(leng... N0+1, N1 < N+1, between(N1,N,I).\end{verbatim}\end{minipage}\end{displaymath} (14)

The complete program is given in the appendix. For instance, the expression n_queens(5) produces the automaton in figure 1.

Figure 1: Solution to the 5-queens problem
\includegraphics [width=11cm]{}

The mechanism described sofar to define new regular expression operators is already quite powerful. As another illustration, consider the problem of compiling a given finite automaton into a regular expression. This problem becomes trivial if we allow the introduction of new operators. Here is the definition of an operator `fa/5' which describes an automaton as a listing of its components:

... [[States x [],?]*,States x []]
))\end{verbatim}\end{minipage}\end{displaymath} (15)

As an example, the automaton given in figure 2.16 of [10] (given in figure 2) can be specified as:
[q2,1,q3],[q3,0,q2],[q3,1,q2]})\end{verbatim}\end{minipage}\end{displaymath} (16)

Figure 2: Example automaton from figure 2.16 of [10].
\naput{\rnode{x}{0 1}}

next up previous
Next: Direct implementation of new Up: Extendible Regular Expression Operators Previous: Extendible Regular Expression Operators