picture of cdrom

Welcome to the Alpino Treebank website. The Alpino treebank contains syntactically annotated Dutch sentences. The treebank (more than 150,000 words) includes the full cdbl (newspaper) part of the Eindhoven corpus.

The Alpino Treebank was released in 2002. In the mean-time, our treebanking efforts have led to various corrections of the actual annotations, improvements of the various tools we use, and differences in the actual XML-format that we use for the annotations.

For our more recent treebanking efforts, please refer to the LASSY project. You will find documentation of the most recent formats, tools and treebanks there.

On the current page you will find a link to the content of the original CDROM, up-to-date versions of the original Alpino treebanks, a list of publications until 2002 describing these treebanks, and the researchers involved in creating the Alpino Treebank CDROM.

The Alpino Treebank CDROM was created in the context of the NWO Pionier project Algorithms for Linguistic Processing.

Original CDROM Version

The content of the CDROM which appeared in November 2002.

Treebanks: hand-corrected

The following treebanks are up-to-date and extended versions of the treebanks on the original Alpino Treebank CDROM.

Publications

Who to blame

Feedback

We appreciate your feedback if you find errors. Please send a polite email to: vannoord@let.rug.nl

Hugo Brandt Corstius was keynote speaker at the 13th CLIN meeting (29 November 2002 in Groningen). After his presentation, the first cdrom was officially handed to him.

van Noord and Brandt Corstius