Welcome to the Alpino Treebank website. The Alpino treebank contains
syntactically annotated Dutch sentences. The treebank (more than
150,000 words) includes the full cdbl (newspaper) part of the
The Alpino Treebank was released in 2002. In the mean-time, our
treebanking efforts have led to various corrections of the actual
annotations, improvements of the various tools we use, and differences
in the actual XML-format that we use for the annotations.
For our more recent treebanking efforts, please refer to the LASSY project. You will find
documentation of the most recent formats, tools and treebanks there.
On the current page you will find a link to the content of the
original CDROM, up-to-date versions of the original Alpino treebanks,
a list of publications until 2002 describing these treebanks, and the
researchers involved in creating the Alpino Treebank CDROM.
The Alpino Treebank CDROM was created in the context of the
NWO Pionier project
Algorithms for Linguistic Processing.
Original CDROM Version
The content of the CDROM which appeared in November
The following treebanks are up-to-date and extended versions of the treebanks
on the original Alpino Treebank CDROM.
- Robert Malouf, Gertjan van Noord. Wide Coverage Parsing with
Stochastic Attribute Value Grammars. In: IJCNLP-04 Workshop Beyond
Shallow Analyses - Formalisms and statistical modeling for deep
analyses. [pdf,web page]
- Chapter 5. The Alpino Dependency Treebank. In: Leonoor van der
Beek, Gosse Bouma, Jan Daciuk, Tanja Gaustad, Robert Malouf,
Gertjan van Noord, Robbert Prins, Begoña Villada,
Algorithms for Linguistic Processing NWO PIONIER Progress
Report. Groningen 2002.
- Leonoor van der Beek, Gosse Bouma, Robert Malouf, Gertjan van
Noord. The Alpino Dependency Treebank. In:
Computational Linguistics in the Netherlands CLIN 2001. Rodopi 2002.
Leonoor van der Beek, Gosse Bouma, and Gertjan van Noord.
Een brede computationele grammatica voor het Nederlands.
Nederlandse Taalkunde, 2002.
Gosse Bouma and Geert Kloosterman. Querying dependency treebanks in XML.
In Proceedings of the Third international conference on Language
Resources and Evaluation (LREC), Gran Canaria, 2002.
- Gosse Bouma, Gertjan van Noord, Robert Malouf. Alpino: Wide
Coverage Computational Analysis of Dutch. In: Computational
Linguistics in the Netherlands CLIN 2000. Rodopi 2001.
Who to blame
- Leonoor van der Beek (annotation)
- Gosse Bouma (annotation)
- Jan Daciuk (tools)
- Geert Kloosterman (tools)
- Robert Malouf (tools)
- Gertjan van Noord (annotation, tools)
- Robbert Prins (art work, tools)
We appreciate your feedback if you find errors. Please
send a polite email to:
Hugo Brandt Corstius was keynote speaker at the 13th CLIN meeting
(29 November 2002 in Groningen). After his presentation, the first
cdrom was officially handed to him.