CLIN 29 in Groningen

Comparing two methods for adding Enhanced Dependencies to UD treebanks
Gosse Bouma

While Universal Dependencies have proven to be a convenient level of syntactic annotation for many applications, such as language typology, stylistics, (cross-lingual) parser comparison, relation extraction, and construction of word embeddings, there is also concern that it is not capturing all relevant syntactic distinctions and that it may be suboptimal for construction of semantic representations. Enhanced Universal Dependencies aim to remedy these shortcomings by providing an improved treatment of control, coordination, and ellipsis.

When adding enhanced dependencies to an existing UD treebank, one can opt for heuristics that predict the enhanced dependencies on the basis of the UD annotation only. If the treebank is the result of conversion from an underlying treebank with language specific annotation, an alternative is to produce the enhanced dependencies directly on the basis of this underlying annotation. Here we present a rule-based method for doing the latter for the Dutch UD treebanks. We compare our method with the language independent, UD-based, approach of Schuster (2018). There are a number of systematic differences in the output of both methods, caused by differences in the kind of dependents that are included in distribution of conjuncts, differences in the kinds of xcomp-dependents that have an explicit controller, and whether auxiliaries should be reconstructed in ellipsis. It appears these are the result of insufficient detail in the annotation guidelines. Resolving this should lead to compatible and equally accurate annotation results for both conversion methods.