picture of cdrom

Welcome to the Alpino Treebank website. The Alpino treebank contains syntactically annotated Dutch sentences. The treebank (more than 150,000 words) includes the full cdbl (newspaper) part of the Eindhoven corpus. You will also find here a number of tools to browse and search the treebank.

Original CDROM Version

The content of the CDROM which appeared in November 2002.

Treebanks: hand-corrected

Treebanks: not corrected

Formats

Utilities

A number of scripts and programs is supplied to browse and search through the treebanks. For more recent versions and various new tools, please download the Alpino distribution from http://www.let.rug.nl/~vannoord/alp/Alpino/

Documentation

An attempt to describe a number of differences between the CGN annotation practice and ours is given in this document, which is heavily out of date. The good news is that the number of differences has been reduced heavily recently.

Publications

Alpino Demo

In the context of the treebanking efforts, we are constructing a natural language understanding system for Dutch: Alpino. This ever-growing system is built on top of Hdrug; it contains a wide-coverage HPSG for Dutch, a large-scale lexicon, a parser, a disambiguation component using a log-linear (maximum entropy) model, etc. There is an experimental web-demo.

Who to blame

Further information: Algorithms for Linguistic Processing Homepage.

Feedback

Based on the number of errors that we have found ourselves during the last few months, it is certain that there are still many errors in the treebank. We appreciate your feedback if you find errors. Please send a polite email to: vannoord@let.rug.nl

Hugo Brandt Corstius was keynote speaker at the 13th CLIN meeting (29 November 2002 in Groningen). After his presentation, the first cdrom was officially handed to him.

van Noord and Brandt Corstius