Next: About this document ... Up: Head Corner Parsing for Previous: Bibliography

Head-corner parser for LTAGs

I provide a Prolog implementation of the head-corner parser for Headed, Lexicalized and Feature-based TAGs. The parser returns derivation trees. This parser is the most efficient parser that I was able to implement on the basis of the specifications given above.

This version of the head-corner parser proceeds in three phases. In the first phase the candidate elementary trees are extracted from the lexicon (dictionary look-up). In the second phase the sentence is recognized and items are recorded from which derivation trees (and hence derived trees) can be recovered. In the third phase derivation trees can be recovered (it is also easy to recover derived trees -- if so desired). The third phase is not described here.

The second phase implements the definite clause specification given in the text. It uses memo-ization for the goal predicate only. Memo-izing other relations turns out to be much too expensive. Furthermore extra argument positions are used for administration concerning derivation trees. An extra argument in the hc and hc_na predicates is used to indicate what words are already `reserved' by an auxiliary tree, but not yet processed. This is necessary for termination (note that this would not be a problem in a chart-based implementation), because the parser may otherwise keep on applying adjunctions whose words are already to be consumed by previous adjunctions.

parse(TopCat,String) :-
    first_phase(String,0,Max),              % asserts elementary trees 
    memo(goal(s(Cat,[]),0,Max,0,Max,top)).  % recognition

In the first phase, the predicates ini and aux are asserted. These predicates represent appropriate elementary trees from the lexicon. ini(Cat,Chain,Name,Q0,Q) is true if there is an initial tree Name of which the anchor connects Q0 to Q in the current sentence, of which the root node's top features match with Cat, and of which the chain from the anchor up to the root is represented as Chain.

Similarly, aux(Cat,Chain,Name,Q0,Q) is true if there is an auxiliary tree Name of which the anchor connects Q0 to Q in the current sentence, and which is adjoined at node Cat, and of which Chain is the chain of nodes from the foot node to the root node. Note that the feature unifications of bottom and top between the target node Cat and the foot and root nodes of the auxiliary tree should be taken care of here. Note that we do not explicitly have to represent the foot node as we do not allow adjunctions at foot nodes.

The recognizer asserts items for computations that are successful. On the basis of these items it is possible to recover the derivation trees associated with these computations. In combination with the use of memo-ization this implies that the parser uses packing to reduce the search space during recognition in the case of ambiguities. The resulting set of items of the recognition phase thus provides a compact representation of all derivation trees.

During recognition items are asserted of the form d(Name,Address,SubName), that represent a successful substitution/adjunction of the elementary SubName at node Address of Name. A possible root node is represented as d(top,0,Name).

In the predicates ini and aux semantic information (but not feature-structure information in general) is suppressed. The idea is that ambiguities also lead to different semantic constraints. `Packing' would hardly ever be useful if semantic information was used immediately. In this parser the semantic information is only added during recovery of derivation trees. Note that the recognizer therefore may be a bit too liberal in cases where such semantic constraints are used (non-compositionally) to filter out certain derivations.

The recognizer uses the meta-logical predicate memo(Goal) which is equivalent to call(Goal) except that it uses memo-ization to compute the goal.

% goal(+Tree,?P0,?P,+E0,+E,+ElemTreeName)
% if Tree can be recognized from P0 to P, where E0=<P0, P=<E. 
goal(s(Cat,Chain),P0,P,E0,E,Up):-
    address(Cat,Add), 
    ini(Cat,Chain0,Name,Q0,Q), 
    between(Q0,Q,E0,E),
    hc_na(Chain0,_,Mid,Q0,Q,R0,R,E0,E,[],[],Name),
    hc_na(Chain,Mid,_,R0,R,P0,P,E0,E,[],[d(Up,Add,Name)],Up).

goal(t(Word/Q0,Chain),P0,P,E0,E,Up):-
    connects(Word,Q0,Q), 
    between(Q0,Q,E0,E), 
    hc_na(Chain,_,_,Q0,Q,P0,P,E0,E,[],[],Up).

goal(e(Cat,Chain),P0,P,E0,E,Up):-
    between(Q,Q,E0,E), 
    hc(Chain,Cat,_,Q,Q,P0,P,E0,E,[],[],Up).

%    hc(+Chain,?Head,?Goal,+P0,+P,?Q0,?Q,+E0,+E,+Used,+Items,+ElemName)
% hc_na(+Chain,?Head,?Goal,+P0,+P,?Q0,?Q,+E0,+E,+Used,+Items,+ElemName)
% if from Head a chain can be recognized ending in Goal, where P0-P is
% already spanned, complete chain spans Q0-Q, E0=<Q0 Q=<E. Used represents 
% words that are already consumed by previous adjunctios. Items are the
% items that are found previously - these are asserted once the chain 
% indeed can be recognized. ElemName is the name of the elementary tree
% Chain is taken from. The first predicate allows adjunction at Head, the
% second does not.
hc(Chain,Cat0,Cat,P0,P,Q0,Q,E0,E,U,D,Up) :-
    unify_node(Cat0), 
    hc_na(Chain,Cat0,Cat,P0,P,Q0,Q,E0,E,U,D,Up).
hc(Chain,Cat,Goal,Q1,Q2,P0,P,E0,E,U,Ds,Up) :-
    address(Cat,Add), 
    aux(Cat,Chain0,Name,L0,L), 
    check_pos(L0,L,E0,QL,QR,E,U),
    hc_na(Chain0,_,Mid,Q1,Q2,Q0,Q,E0,E,[L0|U],[],Name),
    hc_na(Chain,Mid,Goal,Q0,Q,P0,P,E0,E,U,[d(Up,Add,Name)|Ds],Up).

hc_na([],Cat,Cat,P0,P,P0,P,_,_,_,Deriv,_) :- assert_deriv(Deriv).
hc_na([c(Mid,L,R)|T],_,Goal,QL,QR,P0,P,E0,E,U,D,Up) :-
    goal_l(L,Q0,QL,E0,Up), 
    goal_r(R,QR,Q,E,Up),
    hc(T,Mid,Goal,Q0,Q,P0,P,E0,E,U,D,Up).

% goal_l(+Trees,?P0,+P,+E0,+ElemTreeName)
% recognizes a reversed list of Trees from P0 to P, where E0 =< P0.
% ElemName is the name of the elementary tree Trees is taken from. 
goal_l([],Q,Q,_,_).
goal_l([H|T],Q0,Q,E0,Up):- 
    memo(goal(H,Q1,Q,E0,Q,Up)), 
    goal_l(T,Q0,Q1,E0,Up).

% goal_r(+Trees,+P0,?P,+E,+ElemName)
% recognizes a list of trees from P0 to P, P=<E, ElemName is name of
% the elementary tree that Trees is taken from. 
goal_r([],Q,Q,_,_).
goal_r([H|T],Q0,Q,E,Up):- 
    memo(goal(H,Q0,Q1,Q0,E,Up)), 
    goal_r(T,Q1,Q,E,Up).

% assert_deriv(+ItemsToBeAsserted)
assert_deriv([]). 
assert_deriv([H|T]) :- 
    (   \+ H 
    ->  assertz(H)
    ;   true
    ), 
    assert_deriv(T).

% check_pos(+P0,+P,+L0,+L,+R0,+R,+Used)
% the interval P0-P is either in L0-L or R0-R and
% P0 is not in Used
check_pos(P0,P,L0,L,R0,R,U):- 
    \+ member(P0,U), 
    (  between(P0,P,L0,L) 
    ;  between(P0,P,R0,R) 
    ).

Next: About this document ... Up: Head Corner Parsing for Previous: Bibliography

Noord G.J.M. van
1998-09-29