An annotated corpus for the analysis of VP ellipsis

By Johan Bos and Jennifer Spenader

The standoff annotation (character-based on the raw WSJ files) can be found in the src/data/vpe directory in the development version of the candc tools. Or you can download them here in a tarball (Version: 18 Feb 2011).

If you unpack the tarball you will find a text file with the standoff annotation --- one for each WSJ section. Each line in a file contains at least 9 columns. Columns are separated by white space. They contain the filename, start character of the VPE trigger, end character of the VPE trigger, start character of the antecedent, end character of the antecedent, auxiliary form.