Language Technology
Example Exam Questions
Basic Ideas in Language Technology
- Explain the difference between the "evaluation" and
the "assessment" of language technology, using
one or more examples that highlight how the concepts differ.
- Explain the need for "validating" research tools
developed in language technology when these are claimed to
measure properties such as processing complexity (of
sentences), familiarity (of words or constructions), or
comprehensibility (of foreign pronunciations). Is validation
more similar to evaluation or assessment?
- Many research groups worked on projects applying NLP to
database access in the 1980s, some even resulting in
products, so-called "natural language interfaces"
(NLIs). Few of these systems lasted longer than a couple of
years in the market, and none are seen today. In class we
compared the problems of constructing NLIs to the problems
faced by someone who would attempt to construct a machine
to pass the Turing test. Decide whether you wish to
agree with this diagnosis or whether you suspect that
the successful developmen of GUIs instead cut the interest
in NLIs, and support your position. (You may also suggest
a third position and support it if you wish.)
- In class we suggested that "applications" should
always be regarded as relations between technical developments
and market needs (understood broadly). Explain how this
view informs two major ways in which applications can fail.
- Explain the concepts "precision" and "recall"
as applied to the detection of learners' errors in CALL
applications. Which is the more important figure and why?
Corpora
- Explain what balanced corpora are, give an example
of one, and explain what sorts of research they are
used in.
- Explain what parallel corpora are, give an example of one,
and explain what sorts of research and development they are
used in.
- Most annotations are added to corpora automatically, but some
of them are checked by an expert before the corpus is used.
Explain why these manually verified annotations are often
added to corpora, giving examples of at least two type of
annotation that are added, and why they play a role in
research and development.
- In the 1950s Chomsky liberated linguistics from a serious
dependence on corpora by pointing out that native speaker
judgments were normally sufficient to establish what sorts
of words, pronunciations and sentences and phrases existed
in a language. Nonetheless some subareas of linguistics
persisted in collecting corpora throughout the latter half
of the twentieth century. Which areas were these, and why
were they not satisfied with native speaker judgments?
- Explain (i) the type/token distinction, (ii) the typical
frequency distribution of words, and (iii) why this means
that even 10-million word [are these 106 types or
106 tokens?] corpora may be insufficient in size
when one seeks examples of concrete phenomena.
Discourse
See the Nestor site for this
course, and Dr. Spenader's suggested questions for discourse.
Simulating Acquisition
- What is "linguistic nativism"? Briefly describe it
in comparison with with "empiricism". How is it
related to claims about "Universal Grammar"?
- What is Gold's (Gold 1967) theorem, and what is the relevance
of it to the language acquisition?
- Why is "segmentation" is an interesting problem for
human language acquisition and natural language engineering?
Provide examples for both humans learning languages and
natural language engineering applications where segmentation
is necessary or useful.
- Techniques from machine learning are widely used in
engineering-oriented language technology applications, such
as "named-entity recognition", well as in
cognitively motivated computational models of human language
acquisition. The models of human language acquisition need
to meet certain additional criteria (besides being able to
learn correctly and effectively) to be plausible models. List
and briefly explain two such criteria.
Machine Translation
- Name the three main approaches to machine translation before
the advent of hybrid systems. Define them briefly in your own
words. Explain briefly the advantages and disadvantages of each.
- Name three kinds of ambiguity that arise in machine
translation and explain how researchers and engineers try to
deal with them.
- It has been shown that adding syntactic knowledge to
statistical machine translation systems improves
performance. Give a brief definition of syntax in your own
words and explain, using an example, why knowledge of
syntactic structure is seen as generally beneficial in
machine translation.
- What is meant by a statistical machine translation system
being a "black box"?
Information Extraction
- Explain what is meant by 'information extraction' (IE) in particular
in contrast to 'information retrieval', and discuss three
examples of practical applications of IE, explaining why
they are commercially interesting.
- Explain the concepts 'precision' and 'recall' as they are
used in the task of identifying the technical terms used
within a domain such as solid-state electronics.
- The strategy behind identifying basic technical terminology
in a given field has been described as "finding the
frequent infrequent words". Explain what is meant
by this and why it helps identify technical terminology.
- The strategy behind identifying compound technical terminology
in a given field builds on the set of basic terms that has
been identified and focuses on finding word pairs and triples
that appear together reliably. Pick one or more techniques
aimed at identifying the words that appear together reliably,
and explain informally how they work.
- What else is involved in IE besides detecting terminology?
Provide some examples and discuss the motivation for extracting
the additional sorts of information.
Generation
- Describe two forms of sentence fusion, and give an
example of each.
- How can generation improve statistical machine translation?
- Chart generation is algorithmically more complex than chart parsing.
Describe the cause of this increased complexity and present
a brief example to demonstrate it.
- In fluency ranking, we can distinguish two categories of
features, describe both categories and give two example
features for each category.
- Jaynes, a pioneer of maximum entropy modeling, wrote:
" [...] the fact that a certain probability
distribution maximizes entropy subject to certain constraints
representing our incomplete information, is the fundamental
property which justifies the use of that distribution for
inference; it agrees with everything that is known but carefully
avoids assuming anything that is not known."
Describe what constraints are imposed and under what
probability distribution entropy is maximized.
Pronunciation Comparison
- Why is an an aggregative view of linguistic variation
an improvement over investigating only individual features?
- Explain informally how the Levenshtein algorithm works and
give the Levenshtein distance (including alignment) between
appels and apples.
- Why is a regular cluster map (map divided into regions) not a
good visualization method of aggregate dialect distances?
- Does the Levenshtein-based measure of pronunciation distance
give a valid overview of dialectal language variation?
Explain.
- Explain the main scientific contribution to dialectology research
arising from the use of the bi-partite spectral graph clustering
approach.
Name the other area of science (i.e. not linguistics)
in which bi-partite spectral graph clustering was first
developed? And explain why linguistics often shares problems
and solutions with this area.
Acknowledgements
Dr. Jennifer Spenader normally teaches this course and kindly made
her materials available. About three quarters of the materials on corpora
were developed by Jennifer and all of the materials on discourse.
Links
The Association for Computational
Linguistics comprises over 2,000 researchers world wide and holds
several conferences in different parts of the world every year. The
largest are attended by more than 1,000 researchers.
John Nerbonne
Last modified: Mon Aug 17, 2009