next up previous
Next: Selection Up: Parsing Previous: Constituent Marking

Addition of lexical information

Alpino is set up as a broad coverage parser. The goal is to build an analyzer of unrestricted text. Therefore a large lexicon has been created and extensive unknown word heuristics have been added to the grammar. Still, it is inevitable that the parser will come across unknown words that it cannot handle yet. Verbs are used with extra or missing arguments, Dutch sentences are mingled with foreign words, spelling mistakes make common words unrecognizable. In most cases, the parser will either skip such a word or assign an inappropriate category to it. The only way to make the system correctly use the word, is to add a lexical entry for it in the lexicon.

Adding new words to the lexicon costs time: one has to write the entry, save the new lexicon and reload it. It would be far more efficient to add all new words one comes across during an annotation session at once, avoiding spurious reloadings. Furthermore, not all unknown words the parser finds should be added to the lexicon. One would want to use misspelled words and verbs with an incorrect number of arguments only once to build a parse with.

Alpino has temporary, on line addition of lexical information built in for this purpose. Unknown words can temporarily be added to the lexicon with the command add_tag or add_lex. Like the words in the lexicon, this new entry should be assigned a feature structure. add_tag allows the user to specify the lexical type as the second argument. However types may change and especially for verbs it is sometimes hard to decide which of the subcategorization frames should be used. For that reason the command add_lex allows us to assign to unknown words the feature structure of a similar word, that could have been used on that position. The command add_lex stoel tafel for instance assigns the feature structure of fig. 1 to the word stoel. The command add_lex zoen slaap assigns zoen all feature structures of slaap, including imperative and 1st person singular present for all sub-categorization frames of slapen. The lexical information is automatically deleted when the annotation session is finished.


next up previous
Next: Selection Up: Parsing Previous: Constituent Marking
Noord G.J.M. van
2002-06-13