Alpino aims at providing a wide-coverage, accurate, computational grammar for Dutch. The linguistic component of the system consists of a lexicalist feature-based grammar for Dutch, a wide-coverage and detailed lexicon, and a method for constructing dependency treebanks. The parser contains a lexical analysis module and a method for reconstructing parses from a parse forest using beam search, which allows the linguistic knowledge to be applied efficiently and robustly to unrestricted text. Finally, we have presented preliminary experiments aimed at providing accurate disambiguation.
In the near future, we hope to address a number of additional issues. The valency information in the lexicon is in many ways incomplete. We hope to obtain a more complete lexicon by acquiring dependency frames from corpora. Lexical analysis currently uses hand-written filter rules to reduce the number of tags for lexical items. An obvious alternative is to use a corpus-based part-of-speech tagger to arrive at the relevant filters. Finally, the work on disambiguation can profit from the availability of more annotated material. This suggests that our efforts at creating a dependency treebank may lead to improved results in the future.