next up previous
Next: Interactive lexical analysis Up: The annotation process Previous: The annotation process


Parsing

The annotation process typically starts with parsing a sentence from the corpus with the Alpino parser. This is a good method, since building up dependency trees manually is extremely time consuming and error prone. Usually the parser produces a correct or almost correct parse. If the parser cannot build a structure for a complete sentence, it tries to generate as large a structure as possible (e.g. a noun phrase or a complementizer phrase). The main disadvantage of parsing is that the parser produces a large set of possible parses (see fig.3). This is a well known problem in grammar development: the more linguistic phenomena a grammar covers, the greater the ambiguity per sentence. Because selection of the best parse from such a large set of possible parses is time consuming, we have tried to reduce the set of generated parses. The interactive lexical analyzer and the constituent marker restrict the parsing process which results in reduced sets of parses. A tool for on line addition of lexical information makes parsing of sentences with unknown words more accurate and efficient.

Figure 3: Number of parses generated per sentence by the Alpino parser
\includegraphics [angle=270,scale=0.4]{ambig.epsi}



Subsections
next up previous
Next: Interactive lexical analysis Up: The annotation process Previous: The annotation process
Noord G.J.M. van
2002-06-13