Language Technology

Example Exam Questions

Basic Ideas in Language Technology

Explain the difference between the "evaluation" and the "assessment" of language technology, using one or more examples that highlight how the concepts differ.
Explain the need for "validating" research tools developed in language technology when these are claimed to measure properties such as processing complexity (of sentences), familiarity (of words or constructions), or comprehensibility (of foreign pronunciations). Is validation more similar to evaluation or assessment?
Many research groups worked on projects applying NLP to database access in the 1980s, some even resulting in products, so-called "natural language interfaces" (NLIs). Few of these systems lasted longer than a couple of years in the market, and none are seen today. In class we compared the problems of constructing NLIs to the problems faced by someone who would attempt to construct a machine to pass the Turing test. Decide whether you wish to agree with this diagnosis or whether you suspect that the successful developmen of GUIs instead cut the interest in NLIs, and support your position. (You may also suggest a third position and support it if you wish.)
In class we suggested that "applications" should always be regarded as relations between technical developments and market needs (understood broadly). Explain how this view informs two major ways in which applications can fail.
Explain the concepts "precision" and "recall" as applied to the detection of learners' errors in CALL applications. Which is the more important figure and why?

Corpora

Explain what balanced corpora are, give an example of one, and explain what sorts of research they are used in.
Explain what parallel corpora are, give an example of one, and explain what sorts of research and development they are used in.
Most annotations are added to corpora automatically, but some of them are checked by an expert before the corpus is used. Explain why these manually verified annotations are often added to corpora, giving examples of at least two type of annotation that are added, and why they play a role in research and development.
In the 1950s Chomsky liberated linguistics from a serious dependence on corpora by pointing out that native speaker judgments were normally sufficient to establish what sorts of words, pronunciations and sentences and phrases existed in a language. Nonetheless some subareas of linguistics persisted in collecting corpora throughout the latter half of the twentieth century. Which areas were these, and why were they not satisfied with native speaker judgments?
Explain (i) the type/token distinction, (ii) the typical frequency distribution of words, and (iii) why this means that even 10-million word [are these 10⁶ types or 10⁶ tokens?] corpora may be insufficient in size when one seeks examples of concrete phenomena.

Discourse

See the Nestor site for this course, and Dr. Spenader's suggested questions for discourse.

Simulating Acquisition

What is "linguistic nativism"? Briefly describe it in comparison with with "empiricism". How is it related to claims about "Universal Grammar"?
What is Gold's (Gold 1967) theorem, and what is the relevance of it to the language acquisition?
Why is "segmentation" is an interesting problem for human language acquisition and natural language engineering? Provide examples for both humans learning languages and natural language engineering applications where segmentation is necessary or useful.
Techniques from machine learning are widely used in engineering-oriented language technology applications, such as "named-entity recognition", well as in cognitively motivated computational models of human language acquisition. The models of human language acquisition need to meet certain additional criteria (besides being able to learn correctly and effectively) to be plausible models. List and briefly explain two such criteria.

Machine Translation

Name the three main approaches to machine translation before the advent of hybrid systems. Define them briefly in your own words. Explain briefly the advantages and disadvantages of each.
Name three kinds of ambiguity that arise in machine translation and explain how researchers and engineers try to deal with them.
It has been shown that adding syntactic knowledge to statistical machine translation systems improves performance. Give a brief definition of syntax in your own words and explain, using an example, why knowledge of syntactic structure is seen as generally beneficial in machine translation.
What is meant by a statistical machine translation system being a "black box"?

Information Extraction

Explain what is meant by 'information extraction' (IE) in particular in contrast to 'information retrieval', and discuss three examples of practical applications of IE, explaining why they are commercially interesting.
Explain the concepts 'precision' and 'recall' as they are used in the task of identifying the technical terms used within a domain such as solid-state electronics.
The strategy behind identifying basic technical terminology in a given field has been described as "finding the frequent infrequent words". Explain what is meant by this and why it helps identify technical terminology.
The strategy behind identifying compound technical terminology in a given field builds on the set of basic terms that has been identified and focuses on finding word pairs and triples that appear together reliably. Pick one or more techniques aimed at identifying the words that appear together reliably, and explain informally how they work.
What else is involved in IE besides detecting terminology? Provide some examples and discuss the motivation for extracting the additional sorts of information.

Generation

Describe two forms of sentence fusion, and give an example of each.
How can generation improve statistical machine translation?
Chart generation is algorithmically more complex than chart parsing. Describe the cause of this increased complexity and present a brief example to demonstrate it.
In fluency ranking, we can distinguish two categories of features, describe both categories and give two example features for each category.
Jaynes, a pioneer of maximum entropy modeling, wrote:
" [...] the fact that a certain probability distribution maximizes entropy subject to certain constraints representing our incomplete information, is the fundamental property which justifies the use of that distribution for inference; it agrees with everything that is known but carefully avoids assuming anything that is not known."

Describe what constraints are imposed and under what probability distribution entropy is maximized.

Pronunciation Comparison

Why is an an aggregative view of linguistic variation an improvement over investigating only individual features?
Explain informally how the Levenshtein algorithm works and give the Levenshtein distance (including alignment) between appels and apples.
Why is a regular cluster map (map divided into regions) not a good visualization method of aggregate dialect distances?
Does the Levenshtein-based measure of pronunciation distance give a valid overview of dialectal language variation? Explain.
Explain the main scientific contribution to dialectology research arising from the use of the bi-partite spectral graph clustering approach. Name the other area of science (i.e. not linguistics) in which bi-partite spectral graph clustering was first developed? And explain why linguistics often shares problems and solutions with this area.

Acknowledgements

Dr. Jennifer Spenader normally teaches this course and kindly made her materials available. About three quarters of the materials on corpora were developed by Jennifer and all of the materials on discourse.

Links

The Association for Computational Linguistics comprises over 2,000 researchers world wide and holds several conferences in different parts of the world every year. The largest are attended by more than 1,000 researchers.

John Nerbonne
Last modified: Mon Aug 17, 2009