The calculation of the intersection of a CFG and a FSA is very simple
[1]. The (context-free) grammar defining this
intersection is simply constructed by keeping track of the state names
in the non-terminal category symbols. For each rule
X0
X1...Xn there are rules
X0q0q
X1q0q1
X2q1q2
...
Xnqn - 1q
, for all
q0...qn. Furthermore for each
transition
(qi,
) = qk we have a rule
qiqk
. Thus the intersection of a FSA and a
CFG is a CFG that exactly derives all parse-trees. Such a grammar
might be called the parse-forest grammar.
Although this construction shows that the intersection of a FSA and a
CFG is itself a CFG, it is not of practical interest. The reason is
that this construction typically yields an enormous amount of rules
that are `useless'. In fact the (possibly enormously large) parse
forest grammar might define an empty language (if the intersection was
empty). Luckily `ordinary' recognizers/parsers for CFG can be easily
generalized to construct this intersection yielding (in typical cases)
a much smaller grammar. Checking whether the intersection is empty or
not is then usually very simple as well: only in the latter case will
the parser terminate succesfully.
To illustrate how a parser can be generalized to accept a FSA as input we present a simple top-down parser.
A context-free grammar is represented as a definite-clause specification as follows. We do not wish to define the sets of terminal and non-terminal symbols explicitly, these can be understood from the rules that are defined using the relation rule/2, and where symbols of the rhs are prefixed with `-' in the case of terminals and `+' in the case of non-terminals. The relation top/1 defines the start symbol. The language L' = anbn is defined as:
top(s). rule(s,[-a,+s,-b]). rule(s,[]).
In order to illustrate how ordinary parsers can be used to compute the intersection of a FSA and a CFG consider first the definite-clause specification of a top-down parser. This parser runs in polynomial time if implemented using Earley deduction or XOLDT resolution [14]. It is assumed that the input string is represented by the trans/3 predicate.
parse(P0,P) :- top(Cat), parse(+Cat,P0,P). parse(-Cat,P0,P) :- trans(P0,Cat,P), side_effect(p(Cat,P0,P) --> Cat). parse(+Cat,P0,P) :- rule(Cat,Ds), parse_ds(Ds,P0,P,His), side_effect(p(Cat,P0,P) --> His). parse_ds([],P,P,[]). parse_ds([H|T],P0,P,[p(H,P0,P1)|His]) :- parse(H,P0,P1), parse_ds(T,P1,P,His).
The predicate side_effect is used to construct the parse forest grammar. The predicate always succeeds, and as a side-effect asserts that its argument is a rule of the parse forest grammar. For the sentence `a a b b' we obtain the parse forest grammar:
p(s,2,2) --> []. p(s,1,3) --> [p(-a,1,2),p(+s,2,2),p(-b,2,3)]. p(s,0,4) --> [p(-a,0,1),p(+s,1,3),p(-b,3,4)]. p(a,1,2) --> a. p(a,0,1) --> a. p(b,2,3) --> b. p(b,3,4) --> b.The reader easily verifies that indeed this grammar generates (a isomorphism of) the single parse tree of this example, assuming of course that the start symbol for this parse-forest grammar is p(s,0,4). In the parse-forest grammar, complex symbols are non-terminals, atomic symbols are terminals.
Next consider the definite clause specification of a FSA.
We define the transition relation using
the relation trans/3. For start states, the relation start/1
should hold, and for final states the relation final/1 should hold.
Thus the following FSA, defining the regular language
L = (aa)*b+
(i.e. an even number of a's followed by at least one b) is given as:
Interestingly, nothing needs to be changed to use the same parser for the computation of the intersection of a FSA and a CFG. If our input `sentence' now is the definition of trans/3 as given above, we obtain the following parse forest grammar (where the start symbol is p(s,q0,q2)):
p(s,q0,q0) --> []. p(s,q1,q1) --> []. p(s,q1,q2) --> [p(-a,q1,q0),p(+s,q0,q0),p(-b,q0,q2)]. p(s,q0,q2) --> [p(-a,q0,q1),p(+s,q1,q2),p(-b,q2,q2)]. p(s,q1,q2) --> [p(-a,q1,q0),p(+s,q0,q2),p(-b,q2,q2)]. p(a,q0,q1) --> a. p(a,q1,q0) --> a. p(b,q0,q2) --> b. p(b,q2,q2) --> b.Thus, even though we now use the same parser for an infinite set of input sentences (represented by the FSA) the parser still is able to come up with a parse forest grammar. A possible derivation for this grammar constructs the following (abbreviated) parse tree in figure 1. Note that the construction of Bar Hillel would have yielded a grammar with 88 rules.