Welcome to TermPedia

TermPedia is an automatic document enrichment tool that:
Provides contextually relevant information for technical terms

Graduate Student: P. Olango
Supervisor: Dr. G. Bouma
Assistant: G. Kramer
Promoter: Prof. Dr. J. Nerbonne
Employment date: 1 April 2008, expected finished date: 31 March 2012
Financed by: NUFFIC – NPT Project

There is no doubt that technical terms and/or jargon may be a hindrance to document comprehension. This project aims at providing relevant contextual information for technical terms through document enrichment. Document enrichment is a technique that employs natural language processing (NLP) techniques like automatic term recognition, information extraction, and word sense disambiguation for generating links for technical terms to contextually relevant definitions and background knowledge.

In particular, the project will provide relevant contextual information for technical terms in scholarly documents by linking technical terms to their definitions in encyclopedias such as Wikipedia. Both supervised and unsupervised methods for term extraction shall be explored. In the next step, all terms need to be linked to their definitions in an encyclopedia. As terms may be ambiguous it is important to determine the sense of the terms as used in the document, and to provide a link to the contextually relevant definition. The word ontology, for instance, has slightly different meaning in philosophy and computer science. If the system encounters the word ontology in a computer science text, the computer science meaning (conceptualization of a knowledge domain) should be given, and not the philosophical definition (a sub discipline of metaphysics). The following research questions are therefore expected to be answered at the end of the project:

  1. How can technical terms be identified in text and how can they be linked accurately to encyclopedic resources?
  2. Does automatic document enrichment improve understanding of technical documents?
  3. Does automatic document enrichment reduce the time required to acquire knowledge and understanding?

The first question will be investigated using various NLP techniques and various resources, such as Wikipedia and Unified Medical Language System (UMLS). The second and third research question will be investigated in user studies.