next up previous
Next: The intersection of a Up: The intersection of Finite Previous: Introduction

The intersection of a CFG and a FSA

The calculation of the intersection of a CFG and a FSA is very simple [1]. The (context-free) grammar defining this intersection is simply constructed by keeping track of the state names in the non-terminal category symbols. For each rule X0 $ \rightarrow$ X1...Xn there are rules $ \langle$X0q0q$ \rangle$ $ \rightarrow$ $ \langle$X1q0q1$ \rangle$$ \langle$X2q1q2$ \rangle$...$ \langle$Xnqn - 1q$ \rangle$, for all q0...qn. Furthermore for each transition $ \delta$(qi,$ \sigma$) = qk we have a rule $ \langle$$ \sigma$qiqk$ \rangle$ $ \rightarrow$ $ \sigma$. Thus the intersection of a FSA and a CFG is a CFG that exactly derives all parse-trees. Such a grammar might be called the parse-forest grammar.

Although this construction shows that the intersection of a FSA and a CFG is itself a CFG, it is not of practical interest. The reason is that this construction typically yields an enormous amount of rules that are `useless'. In fact the (possibly enormously large) parse forest grammar might define an empty language (if the intersection was empty). Luckily `ordinary' recognizers/parsers for CFG can be easily generalized to construct this intersection yielding (in typical cases) a much smaller grammar. Checking whether the intersection is empty or not is then usually very simple as well: only in the latter case will the parser terminate succesfully.

To illustrate how a parser can be generalized to accept a FSA as input we present a simple top-down parser.

A context-free grammar is represented as a definite-clause specification as follows. We do not wish to define the sets of terminal and non-terminal symbols explicitly, these can be understood from the rules that are defined using the relation rule/2, and where symbols of the rhs are prefixed with `-' in the case of terminals and `+' in the case of non-terminals. The relation top/1 defines the start symbol. The language L' = anbn is defined as:

top(s).

rule(s,[-a,+s,-b]).  rule(s,[]).

In order to illustrate how ordinary parsers can be used to compute the intersection of a FSA and a CFG consider first the definite-clause specification of a top-down parser. This parser runs in polynomial time if implemented using Earley deduction or XOLDT resolution [14]. It is assumed that the input string is represented by the trans/3 predicate.

parse(P0,P) :- 
    top(Cat), parse(+Cat,P0,P).

parse(-Cat,P0,P) :- 
    trans(P0,Cat,P), 
    side_effect(p(Cat,P0,P) --> Cat).
parse(+Cat,P0,P) :- 
    rule(Cat,Ds), 
    parse_ds(Ds,P0,P,His),
    side_effect(p(Cat,P0,P) --> His).

parse_ds([],P,P,[]).  
parse_ds([H|T],P0,P,[p(H,P0,P1)|His]) :-
    parse(H,P0,P1), 
    parse_ds(T,P1,P,His).

The predicate side_effect is used to construct the parse forest grammar. The predicate always succeeds, and as a side-effect asserts that its argument is a rule of the parse forest grammar. For the sentence `a a b b' we obtain the parse forest grammar:

p(s,2,2) --> [].  
p(s,1,3) --> 
    [p(-a,1,2),p(+s,2,2),p(-b,2,3)].
p(s,0,4) --> 
    [p(-a,0,1),p(+s,1,3),p(-b,3,4)].  
p(a,1,2) --> a.  
p(a,0,1) --> a.  
p(b,2,3) --> b.  
p(b,3,4) --> b.
The reader easily verifies that indeed this grammar generates (a isomorphism of) the single parse tree of this example, assuming of course that the start symbol for this parse-forest grammar is p(s,0,4). In the parse-forest grammar, complex symbols are non-terminals, atomic symbols are terminals.

Figure 1: A parse-tree extracted from the parse forest grammar
\begin{figure}
\begin{center}
\leavevmode
\unitlength1pt
\beginpicture
\setplot...
...3 102.00
\put{\hbox{}} [Bl] at 0.00 165.00
\endpicture
\end{center}\end{figure}

Next consider the definite clause specification of a FSA. We define the transition relation using the relation trans/3. For start states, the relation start/1 should hold, and for final states the relation final/1 should hold. Thus the following FSA, defining the regular language L = (aa)*b+ (i.e. an even number of a's followed by at least one b) is given as:

\begin{pspicture}(-1,-1)(5.0,4.0)
\psset{unit=0.025cm}\rput(0,60.0){\circlenode[...
...e{x}{a}}
\nccircle[angle=-90]{<-}{2}{0.5cm}
\trput{\rnode{x}{b}}
\end{pspicture}

Interestingly, nothing needs to be changed to use the same parser for the computation of the intersection of a FSA and a CFG. If our input `sentence' now is the definition of trans/3 as given above, we obtain the following parse forest grammar (where the start symbol is p(s,q0,q2)):

p(s,q0,q0) --> []. 
p(s,q1,q1) --> [].  
p(s,q1,q2) --> 
    [p(-a,q1,q0),p(+s,q0,q0),p(-b,q0,q2)].  
p(s,q0,q2) --> 
    [p(-a,q0,q1),p(+s,q1,q2),p(-b,q2,q2)].  
p(s,q1,q2) --> 
    [p(-a,q1,q0),p(+s,q0,q2),p(-b,q2,q2)].  
p(a,q0,q1) --> a.  
p(a,q1,q0) --> a.  
p(b,q0,q2) --> b.  
p(b,q2,q2) --> b.
Thus, even though we now use the same parser for an infinite set of input sentences (represented by the FSA) the parser still is able to come up with a parse forest grammar. A possible derivation for this grammar constructs the following (abbreviated) parse tree in figure 1. Note that the construction of Bar Hillel would have yielded a grammar with 88 rules.


next up previous
Next: The intersection of a Up: The intersection of Finite Previous: Introduction
Noord G.J.M. van
1998-09-29