To make the description more explicit, we will develop a Prolog implementation of the algorithm for DCGs, along the way introducing some niceties of the algorithm previously glossed over.
In the implementation, a term of the form node(Cat, P0, P) represents a phrase with the syntactic and semantic information given by Cat starting at position P0 and ending at position P in the string being generated. As usual for DCGs, a string position is represented by the list of string elements after the position. The generation process starts with a goal category and attempts to generate an appropriate node, in the process instantiating the generated string.
To generate from a node, we nondeterministically choose a non-chain rule whose left-hand side will serve as the pivot. For each right-hand-side element, we recursively generate, and then connect the pivot to the root.
The processing within generate_rhs
is a simple
iteration.
The connection of a pivot to the root, as noted before, requires
choice of a chain rule whose semantic head matches the pivot, and the
recursive generation of the remaining right-hand-side. We assume a
predicate
applicable_
chain_
rule(SemHead,
LHS, Root, RHS)
that holds if there is a chain rule admitting a node
LHS as the left-hand-side, SemHead as its semantic head, and
RHS as the remaining right-hand-side nodes, such that the
left-hand-side node and the root node Root can be themselves
connected.
The base case occurs when the root and the pivot are the same. One must be careful that all identity checks like this one be implemented correctly in the generator by using a sound unification algorithm with the occurs check. (The default unification in most Prolog systems is unsound in this respect.) For example, a grammar with a gap-threading treatment of wh-movement [14][13] might include the rule
stating that an NP with agreement Agr and semantics Sem can be empty provided that the list of gaps in the NP can be represented as the difference list [np(Agr)/Sem|X]-X, that is the list containing an NP gap with the same agreement features Agr [page 128]pereira-shieber. The above rule is a non-chain rule and so it will be considered when trying to generate any nongap NP, such as the proper noun np(3-sing,G-G)/john. The base case of connect will try to unify that term with the head of the rule above, leading to the attempted unification of X with [np(Agr)/Sem|X], an occurs-check failure. The base case, incorporating the explicit call to a sound unification algorithm is thus as follows:
Now, we need only define the notion of an applicable chain or
non-chain rule. A non-chain rule is applicable if the semantics of
the left-hand-side of the rule (which is to become the pivot) matches
that of the root. Further, we require a top-down check that
syntactically the pivot can serve as the semantic head of the root.
For this latter purpose, we assume a predicate chained_nodes
that codifies the transitive closure of the semantic head relation
over categories. This is the correlate of the link relation used in
left-corner parsers with top-down filtering; we direct the reader to
the discussion by Matsumoto et al. [10] or Pereira and
Shieber [page 182]pereira-shieber for further
information.
A chain rule is applicable to connect a pivot to a root if the pivot can serve as the semantic head of the rule and the left-hand-side of the rule is appropriate for linking to the root.
: Grammar Fragment
: Analysis Tree Traversal Example
The information needed to guide the generation (given as the
predicates chain_rule
, non_chain_
rule
, and
chained_
nodes
) can be computed automatically from the
grammar; a program to compile a DCG into these tables has in fact been
implemented. The details of the process will not be discussed
further. The careful reader will have noticed, however, that no
attention has been given to the issue of terminal symbols on the
right-hand sides of rules. This is because during the compilation
process, the right-hand side of a rule is converted from a list of
categories and terminal strings to a list of nodes connected together
by the difference-list threading technique used for standard DCG
compilation. At that point, terminal strings can be introduced into
the string threading and need never be considered further.