next up previous
Next: Bibliography Up: The Alpino Dependency Treebank Previous: Evaluation and training


Conclusions

A treebank is very important for both evaluation and training of a grammar. For the Alpino parser, no suitable treebank existed. For that reason we have started to develop the Alpino Dependency Treebank by annotating a part of the Eindhoven corpus with dependency structures. As the treebank grows in size, it becomes more and more attractive to use it for linguistic exploration as well, and we have developed an XML format which supports a range of linguistic queries.

To facilitate the time consuming annotation process, we have developed several tools: interactive lexical analysis and constituent marking reduce the set of parses that is generated by the Alpino parser, the tool for addition of lexical information makes parsing of unknown words more efficient and the parse selection tool facilitates the selection of the best parse from a set of parses. In the future, constituent marking could be made more user friendly. We could also look into ways of further reducing the set of maximal discriminants that is generated by the parse selection tool.

The treebank currently contains over 6,000 sentences.6 Much effort will be put in extending the treebank to at least the complete cdbl newspaper part of the Eindhoven corpus, which contains 7,150 sentences.


next up previous
Next: Bibliography Up: The Alpino Dependency Treebank Previous: Evaluation and training
Noord G.J.M. van
2002-06-13