Wablieft: A News Corpus of Easy-To-Read Text
Vincent Vandeghinste and Bram Bulté


We present the Wablieft-corpus, a news corpus of easy-to-read text. It consists of the archive of the digital version of the Wablieft newspaper, a Flemish easy-to-read newspaper.
All texts have been annotated with Frog and Alpino and are freely available for research from the Taalmaterialen website of the Dutch Language Institute (www.ivdnt.org).
We present some of the linguistic properties of the corpus in comparison to normal newspaper data and a number of NLP tools that can benefit from such a dataset, such as the lexical simplification tool described in Bulté et al. (in press).


Bram Bulté, Leen Sevens and Vincent Vandeghinste (in press). Automating lexical simplification in Dutch. CLIN Journal, vol 8.