Syntactic Generation

Gertjan van Noord & Günter Neumann
Alfa-informatica RUG, The Netherlands
Deutsches Forschungzentrum für Künstliche Intelligenz, Saarbrücken, Germany

In a natural language generation module, we often distinguish two components. On the one hand it needs to be decided what should be said. This task is delegated to a planning component. Such a component might produce an expression representing the content of the proposed utterance. On the basis of this representation the syntactic generation component produces the actual output sentence(s). Although the distinction between planning and syntactic generation is not uncontroversial, we will nonetheless assume such an architecture here, in order to explain some of the issues that arise in syntactic generation.

A (natural language) grammar is a formal device that defines a relation between (natural language) utterances and their corresponding meanings. In practice this usually means that a grammar defines a relation between strings and logical forms. During natural language understanding, the task is to arrive at a logical form that corresponds to the input string. Syntactic generation can be described as the problem to find the corresponding string for an input logical form.

We are thus making a distinction between the grammar which defines this relation, and the procedure that computes the relation on the basis of such a grammar. In the current state of the art unification-based (or more general: constraint-based) formalisms are used to express such grammars, e.g., Lexical Functional Grammar (LFG) [Bre82], Head-Driven Phrase-Structure Grammar (HPSG) [PS87] and constraint-based categorial frameworks (cf. [Usz86] and [ZKC87]).

Almost all modern linguistic theories assume that a natural language grammar not only describes the correct sentences of a language, but that such a grammar also describes the corresponding semantic structures of the grammatical sentences. Given that a grammar specifies the relation between phonology and semantics it seems obvious that the generator is supposed to use this specification. For example, Generalized Phrase Structure Grammars (GPSG) [GKPS85] provide a detailed description of the semantic interpretation of the sentences licensed by the grammar. Thus one might assume that a generator based on GPSG constructs a sentence for a given semantic structure, according to the semantic interpretation rules of GPSG. Alternatively, [Bus90] presents a generator, based on GPSG, which does not take as its input a logical form, but rather some kind of control expression which merely instructs the grammatical component which rules of the grammar to apply. Similarly, in the conception of [GP90], a generator is provided with some kind of deep structure which can be interpreted as a control expression instructing the grammar which rules to apply. These approaches to the generation problem clearly solve some of the problems encountered in generation---simply by pushing the problem into the conceptual component (i.e., the planning component). In this overview we restrict the attention to the more ambitious approach sketched above.

The success of the currently developed constraint-based theories is due to the fact that they are purely declarative. Hence, it is an interesting objective---theoretically and practically---to use one and the same grammar for natural language understanding and generation. In fact the potential for reversibility was a primary motivation for the introduction of Martin Kay's Functional Unification Grammar (FUG). In recent years interest in such a reversible architecture has led to a number of publications.png

4.2.1 State of the Art

The different approaches towards the syntactic generation problem can be classified according to a number of dimensions. It is helpful to distinguish between

A generator proceeds from left to right if the elements of the right-hand-side of a rule are processed in a left-to-right order. This order is very common for parsing, but turns out to be unsuitable for generation. For example, [Shi88] presents an Earley-based generation algorithm that follows a left-to-right scheduling. It has been shown that such a strategy leads to a very inefficient behavior when applied for generation. The reason is that the important information that guides the generation process, namely the logical forms, is usually percolated in a different manner. Therefore, semantic-head-driven generation approaches have become popular, most notably the algorithm described in [SPvNM90,Van90,Van93], but see also [CRZ89,GH90,Ger91,Neu94]. Such approaches aim at an order of processing in which an element of the right-hand-side of a rule is only processed once its corresponding logical form has been determined.

As in parsing theory, generation techniques can be classified according to the way they construct the derivation trees. Bottom-up and top-down traversals have been proposed as well as mixed strategies. For example, bottom-up generation strategies are described in [Shi88,Van93], top-down approaches are described in [Wed88,DIP90], and mixed strategies are described in [SPvNM90,Ger91,Neu94].

As in parsing, bottom-up approaches solve some non-termination problems that are encountered in certain top-down procedures.

The above mentioned two dimensions characterize the way in which derivation trees are constructed. A particular choice of these parameters defines a non-deterministic generation scheme, giving rise to a search space that is to be investigated by an actual generation algorithm. Hence, generation algorithms can be further classified with respect to the search strategy they employ. For example, a generation algorithm might propose a depth-first backtrack strategy. Potentially more efficient algorithms might use a chart to represent successfully branches of the search space, optionally combined with a breadth-first search (see for example, [Ger91,CRZ89]). Moreover, there also exist chart-based agenda driven strategies which allow the modeling of preference-based best-first strategies (e.g., [Den94,Neu94]).

4.2.2 Future Directions

Syntactic generation is one of the most elaborated and investigated fields in the area of natural language generation. In particular, due to the growing research in the Computational Linguistics area, syntactic generation has now achieved a methodological status comparable to that of natural language parsing. However, there are still strong limitations which weakens their general applicability for arbitrary application systems. Probably the most basic problems are:

None of the syntactic generators process grammars whose size and status would go beyond that of a laboratory one. The newly proposed approaches in Computational Linguistics are in principle capable of processing declaratively specified grammars, and hence are potentially open to grammars which can be incrementally extented. However, as long as the grammars do not achieve a critical mass, the usability of the approaches for very large grammars is a speculation. The same is true for the status of the lexicons. Currently, generators only use small lexicons. Consequently most of the systems trivialize the problem of lexical choice as being a simple look-up method. However, if very large lexicons were to be used then the lexical choice problem would require more sophisticated strategies.

Of course, there exists some generators whose grammatical coverage is of interest, most notably those from the Systemic Linguistics camp (see section 4.1). However, these generation grammars have a less transparent declarative status, and hence are limited with respect to re-usability and adaptation to other systems.

All known syntactic generators have a limited degree of functionality. Although some approaches have been proposed for solving specific problems, such as generating ellipsis (e.g., [JW82]); generation of paraphrases (e.g., [MS88,Neu94]); generation of referential expressions [Dal90]; or incremental generation (e.g., [DK87]), there exists currently no theoretical and practical framework, which could serve as a platform for combining all these specific operational issues.

Taking these limitations as a basis, important key research problems specific to syntactic generation are:

Large Grammars and Lexicons:

These are needed for obtaining reasonable linguistic competence. As a prerequisite, grammatical knowledge must be specified declaratively in order to support the re-usability, not only for other systems, but also for integrating different specific generation performance methods.


If we want to obtain realistic generation systems then interleaving natural language generation and understanding will be important, e.g., for text revision. It is reasonable to assume that for the case of grammatical processing reversible grammars as well as uniform processing methods are needed. Such a uniform framework might also serve as a platform for integrating generation and understanding specific performance methods.

Incremental Processing:

Rather than generating on the basis of a single complete logical form, some researchers have investigated the possibility of generating incrementally. In such a model small pieces of semantic information are provided to the tactical generator one at the time. Such a model might better explain certain psycholinguistic observations concerning human language production (cf. for example [DK87]).

Producing a non-Ambiguous Utterance:

The generation procedures sketched above all come up with a possible utterance for a given meaning representation. However, given that natural language is very ambiguous the chances are that this proposed utterance itself is ambiguous, and therefore might lead to undesired side-effects. Some preliminary techniques to prevent the production of ambiguous utterances are discussed in [NvN94,Neu94].

Integration of Template- and Grammar-based Generation:

This will be important in order to obtain efficient but flexible systems. This would allow competence grammar to be used in those cases where prototypical constructions (i.e., the templates) are not appropriate or even available.

Logical Equivalence:

An important theoretical and practical problem for natural language generation is the problem of logical form equivalence. For a discussion of this problem we refer to [Shi93].


Proceedings of the Third Conference on Applied Natural Language Processing, Trento, Italy, March 1992.

D[ouglas]. E. Appelt. Planning English Sentences. Cambridge University Press, 1985.

D. E. Appelt. Bidirectional grammars and the design of natural language generation systems. In Y. Wilks, editor, Theoretical Issues in Natural Language Processing-3, pages 185--191. Erlbaum, Hillsdale, New Jersey, 1987.

John A. Bateman. Uncovering textual meanings: a case study involving systemic-functional resources for the generation of Japanese texts. In Paris et al. [PSM91].

J. A. Bateman and E. H. Hovy. An overview of computational text generation. In C. Butler, editor, Computers and Texts: An Applied Perspective, pages 53--74. Basil Blackwell, Oxford, England, 1992.

Hans-Ulrich Block. Compiling trace & unification grammar. In Reversible Grammar in Natural Language Processing [Str94], pages 155--174.

J.A. Bateman, E.A. Maier, E. Teich, and L. Wanner. Towards an architecture for situated text generation. In Proceedings of the ICCICL, Penang, Malaysia, 1991.

J. A. Bateman, J. D. Moore, and R. A. Whitney. Upper modeling: A level of semantics for natural language processing. In IWNLG, editor, Proceedings of the Fifth International Workshop on Natural Language Generation, Pittsburgh, Pennsylvania, April 1990. Springer-Verlag.

J. Bresnan, editor. The Mental Representation of Grammatical Relations. MIT Press, Cambridge, Massachusetts, 1982.

S. Busemann. Generierung natürlicher Sprache mit Generalisierten Phrasenstruktur--Grammatiken. PhD thesis, University of Saarland (Saarbrücken), 1990.

A. Cawsey. Generating Explanatory Discourse: A Plan-Based, Interactive Approach. PhD thesis, University of Edinburgh, 1989.

Proceedings of the 12th International Conference on Computational Linguistics, Budapest, 1988.

J. Calder, M. Reape, and H. Zeevat. An algorithm for generation in unification categorial grammar. In Proceedings of the Fourth Conference of the European Chapter of the Association for Computational Linguistics, pages 233--240, Manchester, 1989. European Chapter of the Association for Computational Linguistics.

R. Dale. Generating receipes: An overview of epicure. In Robert Dale, Chris S. Mellish, and Michael Zock, editors, Current Research in Natural Language Generation, pages 229--255. Academic Press, London, 1990.

Y. Den. Generalized chart algorithm: An efficient procedure for cost-based abduction. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, 1994. Association for Computational Linguistics.

R. Dale, E. H. Hovy, D. Rösner, and O. Stock, editors. Aspects of Automated Natural Language Generation. Number 587 in Lecture Notes in AI. Springer-Verlag, Heidelberg, 1992.

M. Dymetman and P. Isabelle. Reversible logic grammars for machine translation. In Proceedings of the Second International Conference on Theoretical and Methodological issues in Machine Translation of Natural Languages, Pittsburgh, Pennsylvania, 1988.

M. Dymetman, P. Isabelle, and F. Perrault. A symmetrical approach to parsing and generation. In Karlgren [Kar90], pages 90--96.

K. DeSmedt and G. Kempen. Incremental sentence production, self--correction and coordination. In G. Kempen, editor, Natural Language Generation, pages 365--376. Martinus Nijhoff, Dordrecht, 1987.

Robert Dale, Chris S. Mellish, and Michael Zock, editors. Current Research in Natural Language Generation. Academic Press, London, 1990.

M. Elhadad. Using Argumentation to Control Lexical Choice: A Functional Unification-Based Approach. PhD thesis, Computer Science Department, Columbia University, 1992.

M. Elhadad and J. Robin. Controlling content realization with functional unification grammars. In R. Dale, E. H. Hovy, D. Rösner, and O. Syock, editors, Aspects of Automated Natural Language Generation, pages 89--104. Springer, Heidelberg, 1992.

R. P. Fawcett. The state of the craft in computational linguistics: A generationist's viewpoint. Technical Report COMMUNAL Working Papers No. 2, Cardiff Computational Linguistics Unit, University of Wales, 1992.

S. Feiner and K. R. McKeown. Coordinating text and graphics in explanation generation. In Proceedings of the AAAI-90, pages 442--449, Boston, 1990. American Association for Artificial Intelligence.

D. D. Gerdemann. Parsing and Generation of Unification Grammars. PhD thesis, University of Illinois, 1991. Also Cognitive Science Technical Report CS-91-06.

D. Gerdemann and E. W. Hinrichs. Functor-driven generation with categorial-unification grammars. In Karlgren [Kar90], pages 145--150.

G. Gazdar, E. Klein, G. Pullum, and I. Sag. Generalized Phrase Structure Grammar. Blackwell, 1985.

N. Goldman. Conceptual generation. In R.C. Schank, editor, Conceptual Information Processing. North-Holland, Amsterdam, 1975.

C. Gardent and A. Plainfossé. Generating from a deep structure. In Karlgren [Kar90], pages 127--132.

E. H. Hovy and Y. Arens. Automatic generation of formatted text. In Proceedings of the 8th AAAI Conference, Anaheim, California, 1991. American Association for Artificial Intelligence.

E. H. Hovy and K. Knight. Motivation for shared ontologies: An example from the Pangloss collaboration. In Proceedings of the Workshop on Knowledge Sharing and Information Interchange, Chambery, France, 1993.

Eduard Hovy, Julia Lavid, Elisabeth Maier, Vibhu Mittal, and Cecile Paris. Employing knowledge resources in a new text planner architecture. In Aspects of automated natural language generation, pages 57--72. Springer-Verlag, Berlin, 1992.

Helmut Horacek. The architecture of a generation component in a complete natural language dialogue system. In Dale et al. [DMZ90], pages 193--227.

E. H. Hovy. Planning coherent multisentential text. In Proceedings of the 26th Annual Meeting of the Association for Computational Linguistics, SUNY, Buffalo, New York, June 1988. Association for Computational Linguistics.

Eduard H. Hovy. Generating Natural Language under Pragmatic Constraints. Lawrence Erlbaum, Hillsdale, New Jersey, 1988.

Eduard H. Hovy. Automated discourse generation using discourse relations. Artificial Intelligence, 63:341--385, 1993.

P. S. Jacobs. Achieving bidirectionality. In COLING [COL88], pages 267--274.

A. Jameson. How to appear to be conforming to the `maxims' even if you prefer to violate them. In Kempen [Kem87], pages 19--42.

A. Jameson and W. Wahlster. User modelling in anaphora generation: Ellipsis and definite description. In Proceedings of the 1982 European Conference on Artificial Intelligence, pages 222--227, Orsay, France, 1982.

H. Karlgren, editor. Proceedings of the 13th International Conference on Computational Linguistics, Helsinki, 1990. ACL.

Gerard Kempen, editor. Natural Language Generation: Recent Advances in Artificial Intelligence, Psychology, and Linguistics. Kluwer Academic, Boston, Dordrecht, 1987.

Richard Kittredge, Tanya Korelsky, and Owen Rambow. On the need for domain communication knowledge. Computational Intelligence, 7(4):305--314, 1991.

K. Kukich. Knowledge-Based Report Generation: A Knowledge-Engineering Approach. PhD thesis, University of Pittsburgh, 1983.

Alex Lascarides and Jon Oberlander. Abducing temporal discourse. In Proceedings of the Sixth International Workshop on Natural Language Generation, pages 167--182, Trento, Italy, April 1992. Springer-Verlag. Also in [DHRS92].

James R. Martin. English text: systems and structure. Benjamins, Amsterdam, 1992.

C. M. I. M. Matthiessen. Systemic grammar in computation: The Nigel case. In Proceedings of the First Conference of the European Chapter of the Association for Computational Linguistics, Pisa, Italy, 1983. European Chapter of the Association for Computational Linguistics.

Christian M. I. M. Matthiessen. Notes on the organization of the environment of a text generation grammar. In Kempen [Kem87].

W. C. Mann, M. Bates, B. J. Grosz, D. D. McDonald, K. R. McKeown, and W. R. Swartout. Text generation: The state of the art and the literature. Technical Report RR-81-101, USC/Information Sciences Institute, 1981.

Kathleen F. McCoy and Jeanette Cheng. Focus of attention: constraining what can be said next. In Paris et al. [PSM91].

Kathleen F. McCoy. The ROMPER system: Responding to object-related misconceptions using perspective. In Proceedings of the 24th Annual Meeting of the Association for Computational Linguistics, Columbia University, New York, June 1986. Association for Computational Linguistics.

David D. McDonald. Natural Language Production as a Process of Decision Making Under Constraint. PhD thesis, Department of Computer Science and Electrical Engineering, Massachusetts Institute of Technology, 1980.

David D. McDonald. Type-driven suppression of redundancy in the generation of inference-rich reports. In Aspects of automated natural language generation, pages 72--88. Springer-Verlag, Berlin, 1992.

David D. McDonald. Issues in the choice of a source for natural language generation. Computational Linguistics, 19(1):191--197, March 1993.

David D. McDonald. Reversible NLP by linking the grammar to the knowledge base. In Reversible Grammar in Natural Language Processing [Str94], pages 257--292.

Kathleen R. McKeown. Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text. Studies in Natural Language Processing. Cambridge University Press, 1985.

M. Meteer. The ``Generation Gap'': The Problem of Expressibility in Text Planning. PhD thesis, University of Massachusetts at Amherst, 1990.

Marie W. Meteer. Bridging the generation gap between text planning and linguistic realization. Computational Intelligence, 7(4):296--304, 1991.

G. A. Miller. Wordnet: A dictionary browser. In Information in Data: Proceedings of the 1st Conference of the UW Centre for the New Oxford Dictionary. University of Waterloo, Canada, 1985.

W. C. Mann and C. M. I. M. Matthiessen. Nigel: A systemic grammar for text generation. In R. Benson and J. Greaves, editors, Systemic Perspectives on Discourse: Selected Papers from the Ninth International Systemics Workshop. Ablex, London, 1985.

M. Meteer, D. D. McDonald, S. Anderson, D. Foster, L. Gay, A. Huettner, and P. Sibun. Mumble-86: Design and implementation. Technical Report COINS-87-87, University of Massachusetts at Amherst, 1987.

J. D. Moore. A Reactive Approach to Explanation in Expert and Advice-Giving Systems. PhD thesis, University of California at Los Angeles, 1989.

Johanna D. Moore and Cécile L. Paris. Planning texts for advisory dialogues: Capturing intentional and rhetorical information. Computational Linguistics, 19(4):651--694, December 1993.

K. R. McKeown, J. Robin, and M. Tanenblatt. Tailoring lexical choice to the user's vocabulary in multimedia explanation generation. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pages 226--234, Ohio State University, 1993. Association for Computational Linguistics.

K. R. McKeown and W. R. Swartout. Language generation and explanation. Annual Reviews of Computer Science, 2:401--449, 1987.

M. M. Meteer and V. Shaked. Strategies for effective paraphrasing. In COLING [COL88].

William C. Mann and Sandra A. Thompson. Rhetorical structure theory: description and construction of text structures. In Kempen [Kem87], pages 85--96.

Günter Neumann. A Uniform Computational Model for Natural Language Parsing and Generation. PhD thesis, Universität des Saarlandes, Germany, November 1994.

Hans-Joachim Novak. Integrating a generation component into a natural language understanding system. In O. Herzog and C.-R. Rollinger, editors, Text understanding in LILOG: integrating computational linguistics and artificial intelligence, Final report on the IBM Germany LILOG-Project, pages 659--669. Springer-Verlag, Berlin, Heidelberg, New York, 1991. Lecture notes in artificial intelligence, 546.

G. Neumann and G. van Noord. Reversibility and self-monitoring in natural language generation. In Reversible Grammar in Natural Language Processing [Str94], pages 59--96.

Cécile L. Paris. The Use of Explicit Models in Text Generation. Francis Pinter, London, 1993.

Cécile L. Paris. User modelling in text generation. Francis Pinter, London, 1993.

Cécile L. Paris and Elisabeth A. Maier. Knowledge sources of decisions? In Proceedings of the IJCAI-91 Workshop on Decision Making Throughout the Generation Process, pages 11--17, Sydney, Australia, August 1991.

Carl Pollard and Ivan A. Sag. An Information-Based Approach to Syntax and Semantics: Fundamentals. Number 13 in Center for the Study of Language and Information (CSLI) Lecture Notes. Stanford University Press and Chicago University Press, 1987.

Cécile L. Paris, William R. Swartout, and William C. Mann, editors. Natural Language Generation in Artificial Intelligence and Computational Linguistics. Kluwer Academic, Boston, 1990.

C. L. Paris, W. R. Swartout, and William C. Mann, editors. Natural Language Generation in Artificial Intelligence and Computational Linguistics. Kluwer Academic, July 1991.

D. Rösner. Ein System zur Generierung von Deutschen Texten aus Semantischen Repräsentationen. PhD thesis, University of Stuttgart, 1986.

N. Reithinger. Eine Parallele Architektur zur Inkrementellen Generierung Multimodaler Dialogbeiträge. PhD thesis, University of the Saarland, 1991.

Owen Rambox and Tanya Korelsky. Applied text generation. In ANLP [ANL92], pages 40--47.

Ehud Reiter, Chris S. Mellish, and John Levine. Automatic generation of on-line documentation in the IDAS project. In ANLP [ANL92], pages 64--71.

Tomek Strzalkowski, J. P. Carballo, and M. Marinescu. Natural language information retrieval: TREC-3 report. In National Institute of Standards and Technology Special Publication on the The Third Text REtrieval Conference (TREC-3), Washington, DC, 1995. National Institute of Standards and Technology, U.S. Department of Commerce, U.S. Government Printing Office.

Donia Scott and Clarisse Sieckenius de Souza. Getting the message across in RST-based generation. In Dale et al. [DMZ90], pages 47--73.

S. M. Shieber. A uniform architecture for parsing and generation. In COLING [COL88].

Stuart M. Shieber. The problem of logical-form equivalence. Computational Linguistics, 19(1):179--190, March 1993.

S. M. Shieber, F. C. N. Pereira, G. van Noord, and R. C. Moore. Semantic-head-driven generation. Computational Linguistics, 16:30--42, 1990.

T. Strzalkowski. Automated inversion of a unification parser into a unification generator. Technical Report 465, Courant Institute of Mathematical Sciences, New York University, 1989.

T. Strzalkowski. Reversible Grammar in Natural Language Processing. Kluwer Academic Publishers, 1994.

Daniel D. Suthers. Task-appropriate hybrid architectures for explanation. Computational Intelligence, 7(4):315--333, 1991.

D. Suthers. An Analysis of Explanation and its Implications for the Design of Explanation Planners. PhD thesis, University of Massachusetts at Amherst, 1993.

H. Uszkoreit. Categorial unification grammars. In Proceedings of the 11th International Conference on Computational Linguistics, Bonn, 1986. ACL.

G. VanNoord. An overview of head-driven bottom-up generation. In Robert Dale, Chris S. Mellish, and Michael Zock, editors, Current Research in Natural Language Generation, pages 141--165. Academic Press, London, 1990.

G. J. M. VanNoord. Reversibility in Natural Language Processing. PhD thesis, University of Utrecht, The Netherlands, 1993.

W. Von Hahn, W. Höppner, A. Jameson, and W. Wahlster. The anatomy of the natural language dialogue system HAM-RPM. In L. Bolc, editor, Natural Language Based Computer Systems. McMillan, Münich, 1980.

W. Wahlster, E. André, W. Graf, and T. Rist. Designing illustrated texts: How language production is influenced by graphics production. In Proceedings of the Fifth Conference of the European Chapter of the Association for Computational Linguistics, pages 8--14, Berlin, 1991. European Chapter of the Association for Computational Linguistics.

J. Wedekind. Generation as structure driven derivation. In COLING [COL88].

R. Michael Young, Johanna D. Moore, and Martha E. Pollack. Towards a principled representation of discourse plans. In Proceedings of the Sixteenth Conference of the Cognitive Science Society, Atlanta, 1994.

H. Zeevat, E. Klein, and J. Calder. An introduction to unification categorial grammar. In J. Nicholas Haddock, Ewan Klein, and Glyn Morrill, editors, Edinburgh Working Papers in Cognitive Science, volume 1: Categorial Grammar, Unification Grammar, and Parsing, volume 1 of Working Papers in Cognitive Science. Centre for Cognitive Science, University of Edinburgh, 1987.