The TST grammar fragment has been developed within the NWO Priority Programme Language and Speech Technology. The immediate goal of the Programme is the development of a demonstrator of a public transport information system, which operates over ordinary telephone lines. This demonstrator is called OVIS, Openbaar Vervoer Informatie Systeem ( Public Transport Information System). The language of the system is Dutch.
The TST grammar is based on Head-driven Phrase-structure Grammar (HPSG) . HPSG is a linguistic theory which is an ideal candidate to base a computational grammar on, because it combines a sound linguistic base with a clear formalisation. The linguistic orientation can be inferred from the large number of publications treating many of the world's languages, paying attention to many of the phenomena that have been treated in competing linguistic theories over the last few decades. It is possible to get an idea of the existing body of work by browsing the electronic HPSG bibliography available from http://www.dfki.de/lt/HPSG/hpsg_bib.html. A further advantage of HPSG for the current proposal is the fact that Germanic languages, in particular German and Dutch, have been treated extensively.
HPSG combines this theoretical linguistic base with a clear
formalisation. Although this does not imply that HPSG is a
computational theory of language, or that HPSG grammars can be
employed computationally as is, it does provide for many
computational advantages. Moreover, HPSG has been the starting point
for a number of other computational grammars.
The TST grammar uses a high-level formalism (with feature structures, types and inheritance) but is compiled into a Definite-Clause Grammar (DCG). There are a number of benefits for choosing DCG as the basic formalism. Firstly, DCG's provide for a balance between computational efficiency on the one hand and linguistic expressiveness on the other. Secondly, DCG's are a (simple) member of the class of declarative and constraint-based grammar formalisms. Such formalisms are widely used in linguistic descriptions for NLP. Finally, DCG's are straightforwardly related to context-free grammars. Almost all parsing technology is developed for CFG; porting this technology to DCG is usually possible (although there are many non-trivial problems as well).
Finally, the TST grammar is a successful combination of HPSG and DCG. Even if certain aspects of the grammar have been tuned to the domain of application in TST, it is fair to say that the basic architecture of the grammar is fully general. Moreover, the grammar was successfully applied in a formal evaluation on the TST task: 95% concept accuracy on (previously unseen) user utterances 2.4.
The development of a general grammar for Dutch based on this fragment will involve a re-implementation of certain parts of the grammar (e.g. the account of unbounded dependencies, PP-attachment, and verb clustering), so as to make it compatible with linguistic accounts, the addition of a number of syntactic construction types (such as passives and relative clauses) which are currently not in the fragment, and expansion of the lexicon, so as to cover the basic vocabulary of Dutch.
Below, we describe the current fragment in some detail, and then go on to describe the activities we foresee in developing a general grammar.