Current Status

Next: Development Plans Up: Grammar Development for Dutch Previous: Motivation for TST Grammar Contents

Current Status

The design and organisation of the TST grammar, as well as many aspects of the particular grammatical analyses, are based on Head-driven Phrase Structure Grammar [86]. The grammar currently covers the majority of verbal subcategorisation types (intransitives, transitives, verbs selecting a PP, and modal and auxiliary verbs), NP-syntax (including pre- and post-nominal modification, with the exception of relative clauses), PP-syntax, the distribution of VP-modifiers, various clausal types (declaratives, yes/no and WH-questions, and subordinate clauses).

Most, if not all, linguistic theories used in computational linguistics (including GPSG [38], HPSG [86], LFG [14], and unification-based versions of TAG [117], Dependency Grammar [46], and Categorial Grammar [122,109]) employ feature-structures to represent linguistic information, and use unification as the single operation to combine feature structures. Feature-structures may be under-specified, and, depending on the computational formalism used, arbitrary complex constraints may be used in the definition of such under-specified feature-structures. In these unification-based or constraint-based grammar formalisms, each word or phrase in the grammar is associated with a (possibly under-specified) feature-structure, in which phonological (or orthographic), syntactic, and semantic information may be bundled. Within Head-driven Phrase Structure Grammar (HPSG), such feature-structures are called signs.

The TST grammar makes use of 15 different types of sign, where each type roughly corresponds to a different category in traditional linguistic terminology. For each type of sign, a number of features are defined. For example, for the type NP, the features AGR, NFORM, CASE, and SEM are defined. These features are used to encode the agreement properties of an NP, (morphological) form, case and semantics, respectively.

Typical for lexicalist linguistic theories, such as HPSG and Categorial Grammar, is the fact that they define subcategorisation lexically, by means of features representing the list of elements for which they subcategorize. Such an encoding of valence makes it possible to capture significant generalisations at the level of phrase structure, thus leading, in principle, to a drastic reduction of the number of phrase structure rules that have to be postulated. A property which HPSG shares with GPSG, is the fact that it accounts for long-distance dependencies by means of feature-passing. The current implementation uses a (restricted) version of the account of long-distance dependencies proposed in Pollard and Sag [86] and Sag [98]. The account of verb-initial and verb-second clauses follows the transformational grammar tradition [60], in that it assumes that verb-initial clauses are structurally similar to verb-final clauses, and by assuming that verbs in main clauses are linked to an empty element (a phonologically empty verbal sign in this case) occupying a clause-final position.

A restriction imposed by the current grammar-parser interface is that each rule must specify the category of its mother and daughters. A rule which specifies that a head daughter may combine with a complement daughter, if this complement unifies with the first element on SC of the head (i.e. a version of the categorial rule for functor-argument application) cannot be implemented directly, as it leaves the categories of the daughters and mother unspecified. Nevertheless, generalisations of this type do play a role in the grammar. We have adopted an architecture for grammar rules similar to that of HPSG, in which individual rules are classified in a hierarchy of structures (e.g. head-complement and head-modifier structures), which are in turn defined in terms of general principles (such as the HEAD FEATURE PRINCIPLE and the VALENCE PRINCIPLE).

The TST lexicon is a list of clauses associating a word (or sequence of words) with a specific sign. Constraint-based grammars in general, and lexicalist constraint-based grammars in particular, tend to store lots of grammatical information in the lexicon. This is also true for the TST grammar. A lexical entry for a transitive verb, for instance, not only contains information about the morphological form of this verb, but also contains the features SC and SUBJ for which quite detailed constraints may be defined. Furthermore, for all lexical signs it is the case that their semantics is represented by means of a feature-structure. This structure can also be quite complex. To avoid massive reduplication of identical information in the lexicon, multiple inheritance has been used extensively. This architecture should enable the construction of a lexicon of a much bigger size.

Next: Development Plans Up: Grammar Development for Dutch Previous: Motivation for TST Grammar Contents

2000-07-10