Next: Direct implementation of new
 Up: Extendible Regular Expression Operators
 Previous: Extendible Regular Expression Operators
A regular expression operator is defined as a pair  
  macro(ExprA,ExprB) which indicates that the regular expression  
  ExprA is to be interpreted as regular expression  ExprB. For
example, simple nullary regular expression operators (equivalent to
abbreviatory devices found in tools such as  lex and  flex),
can be defined as in the following example:
![\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro( vowel, {a,e,i,o,u} )\end{verbatim}\end{minipage}\end{displaymath}](img5.gif)  | 
(5) | 
 
indicating that the operator  vowel/0 can be understood by
assuming that every occurrence of  vowel in a regular expression
is textually replaced by  {a,e,i,o,u}.
The same mechanism is used to define n-ary operators, exploiting
Prolog variables. For instance, the containment operator  
  containment(Expr) is the set of all strings which have as a
sub-string any of the strings in  Expr. This could be defined as
follows:2
![\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro(containment(Expr), [? *,Expr,? *])\end{verbatim}\end{minipage}\end{displaymath}](img6.gif)  | 
(6) | 
 
Naturally, operators defined in this way can be part of the definition
of other operators. For instance, the operator  free(A) is the
language of all strings which do not have any of the strings in  
  A as a substring. This can be defined as:
![\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro(free(A), ~containment(A))\end{verbatim}\end{minipage}\end{displaymath}](img7.gif)  | 
(7) | 
 
We have found it useful to define boolean operators using this
mechanism. In fact, if we use the universal language to stand for  
  true and the empty language to stand for  false, then the
standard operators for intersection and union correspond to
conjunction and disjunction:
![\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro(true,? *).
macro(false,{}).\end{verbatim}\end{minipage}\end{displaymath}](img8.gif)  | 
(8) | 
 
With these definitions we get the expected properties: 
![\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}true & tru...
...false = false {false,false} = false\end{verbatim}\end{minipage}\end{displaymath}](img9.gif)  | 
(9) | 
 
The macros for true and false can also be used to define a conditional 
expression in the calculus. The operator  coerce_to_boolean
maps the empty language to the empty language, and any non-empty
language to the universal language:
![\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro(coer...
...~coerce_to_boolean(Cond) o Else }).\end{verbatim}\end{minipage}\end{displaymath}](img10.gif)  | 
(10) | 
 
Various interesting properties of automata have been implemented which
yield boolean values, such as the predicates 
 is_equivalent/2 for recognizers, and  
  is_functional/1 and  is_subsequential/1 for transducers
(using the algorithms described in for instance [23]).
Regular expression operator definitions can also be recursive. The
following example demonstrates furthermore that definitions can take
the operands of the operator into account. The operator  
  set(List) yields the union of the languages given by each of the
expressions in the list  List;  union(A,B) is a built-in
operator providing the union of the two languages  A and  B:
![\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro(set(...
...o(set([H\vert T]),union(H,set(T))).\end{verbatim}\end{minipage}\end{displaymath}](img11.gif)  | 
(11) | 
 
We can also exploit the fact that these definitions are directly
interpreted in Prolog by providing Prolog constraints on such
rules. This possibility is used in [7] to
define a longest-match concatenation operator which implements the
leftmost-longest capture semantics required by the POSIX standard
(cf. section 4.4). 
A simple example is a generalization of the operator  free. Suppose
we want to define an operator  free(N,Expr) indicating the set of 
strings which do not contain more than  N occurrences of  
  Expr. This can be done as follows:
![\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro(free...
...0 > 0, N is N0-1, free_list(N,X,T).\end{verbatim}\end{minipage}\end{displaymath}](img12.gif)  | 
(12) | 
 
Another example is an implementation of the N-queens problem:
how to place N queens on an N by N chess-board in such a way that no
queen attacks any other queen.  For any N we can create a regular
expression generating exactly all strings of solutions. A solution to
the N-queen problem is represented as a string of N integers between 1
and N. An integer i at position j in this string indicates that a
queen is placed on the i-th column of the j-th row.
![\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro(n_qu...
...onals(N)
& reverse(diagonals(N)))\end{verbatim}\end{minipage}\end{displaymath}](img13.gif)  | 
(13) | 
 
The operator  n_queens(N) is defined as the intersection of a
number of constraints. The first constraint,  sigma(N)*, indicates 
that a solution must be a string of integers between 1 and  
  N. The second constraint indicates that the length of the string
must be  N. The remaining constraints ensure that queens do not
attack each other. The definition of  length illustrates once
more the use
of Prolog to create a regular expression; the definition of  
  sigma/1 uses the  set operator defined previously. 
![\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro(leng...
...is N0+1, N1 < N+1, between(N1,N,I).\end{verbatim}\end{minipage}\end{displaymath}](img14.gif)  | 
(14) | 
 
The complete program is given in the appendix. For instance, the
expression  n_queens(5) produces the automaton in
figure 1.
Figure 1:
Solution to the 5-queens problem
 
 
  | 
 
The mechanism described sofar to define new regular expression
operators is already quite powerful. As another illustration, consider 
the problem of compiling a given finite automaton into a regular
expression. This problem becomes trivial if we allow the introduction
of new operators. Here is the definition of an operator `fa/5' which
describes an automaton as a listing of its components:
![\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}macro(fa(S...
... [[States x [],?]*,States x []]
))\end{verbatim}\end{minipage}\end{displaymath}](img16.gif)  | 
(15) | 
 
As an example, the automaton given in figure 2.16 of
[10] (given in figure 2) can be specified as:
![\begin{displaymath}\begin{minipage}[t]{.9\textwidth}\begin{verbatim}fa({0,1},{...
...],
[q2,1,q3],[q3,0,q2],[q3,1,q2]})\end{verbatim}\end{minipage}\end{displaymath}](img17.gif)  | 
(16) | 
 
Figure 2:
Example automaton from figure 2.16 of 
[10].
![\begin{figure}
\begin{pspicture}(-1,-1)(6.5,1.4)
\psset{unit=0.025cm}\rput(0,3.0...
...gle=30.0]{->}{2}{1}
\naput{\rnode{x}{0 1}}
\end{pspicture}\centering\end{figure}](img18.gif)  | 
 
 
 
   
 Next: Direct implementation of new
 Up: Extendible Regular Expression Operators
 Previous: Extendible Regular Expression Operators
2000-07-03