(This interview has appeared in Ta!, the Dutch Students' Journal on Computational Linguistics, volume 2, number 1.)
© All rights reserved for Ta! )

John Nerbonne, professor of Alfa-informatica:

``We can learn from the way Computer Scientists do research: with an eye to both practical and theoretical aspects of work.''

John Nerbonne, professor of Alfa-informatica since February 1st, 1993, studied philosophy as an undergraduate and some math and psychology as well. He also started to study `a little bit' of linguistics. He went to Germany for five years, where he obtained a masters degree in Germanic Philology. He returned to the US to study at Ohio State University, where he earned a Master's of Science degree in Computer Engineering and a Ph.D. in Linguistics, where he wrote a dissertation on temporal semantics. The dissertation aroused interest in CL circles, and his interests also moved toward computational linguistics. He has worked mostly at industrial research labs, which leads him to stress practical aspects of computing, without ignoring the theoretical attainments of computer science. New, fresh, and full of ideas - and yet at the same time, experienced and cautious. A portrait.

Undergraduate

``I studied as an undergraduate at Amherst college, in the US, and I was a major in philosophy there. I did some math and psychology as well, and even then I started to study a little bit of linguistics. I took a couple of linguistics courses in my senior year at UMass (University of Massachusetts), which had a great linguistics department, and which was recommended to me then because it was well-known. I managed to take courses then with Barbara Partee and Emmon Bach.

I studied for five years in Germany, in Freiburg, where I did a masters degree in German linguistics. Again, keeping up my interests in math and philosophy, which were minor subjects. And then I went to Ohio where I did a PhD in linguistics and an MS in computer science. In linguistics, I worked with David Dowty, who had done a lot of interesting stuff on syntax and semantics. He advised my dissertation on the semantics of German temporal reference.

One of the things that pushed me towards computational linguistics was that the people who were most interested in my dissertation were the people who were just beginning to try implementations of grammars arising from the ``phrase structure revival'' in computational linguistics. They needed very exact descriptions to get some of their work running. It was very interesting to see that it was these groups, mostly from computer science, that read it most exactly. They certainly had the hardest questions.

I do not think I wanted to go on in computer science research, ever, but it was always a possible fall-back. In those days, just like now, when it is economically difficult, it was something we always kept in the backs of our minds. Suppose your dreams are not realized, then what sorts of jobs can you get? The ``bail-out'' opportunities in computer science were always more attractive than in linguistics.

Germany

Going to Germany was a nice opportunity! It was a sister university to UMass, where I had applied to study linguistics, and instead of just entering the program I went to this German university, thinking it would be a nice way to spend a year or two. I liked it quite a lot, it had a very free atmosphere in which to study. The German system, then, was incredibly free; you essentially studied for five years and took two sets of exams. I was not there because of any `big names,' not there. In Freiburg there were some solid linguists, and there is a very good logic department. But I didn't know about it when I went there; that was just a surprise benefit.

Training

When I was a linguistic student, certainly GPSG was at the peak of its popularity. One of the reasons my dissertation attracted some interest was the fact that I wrote a fairly substantial fragment in GPSG. I was fairly interested in it, and my advisor had become rather interested too, and so what started as a semantics dissertation ended up having some eighty or ninety pages of syntax in it. I always had planned a syntactic fragment to go along with the semantics, so that the predictions could be exact, but I never anticipated spending that much time on syntax when I began working on it.

Other theories that were in vogue at that time were relational grammar, LFG certainly, and trace theory. Those were the very early days of binding theory. I followed it a bit on the outside, but I never worked really on it at all. It is very difficult to work with as a computational linguist, as I guess everyone knows right now... (chuckles) some of us find out more painfully than others. It is because they are asking questions at a level we almost cannot attack as computational linguists. They are asking very abstract and general questions and they are trying to generalize these over the exact form of rules and constructions - but we still need fairly exact rules in order to do much of anything computationally. Maybe someday... there are good minds attacking this problem even now.

Influences

Certainly in semantics, the work in generalized quantifier theory was important to me. You could ask deeper questions than people had been able to before. And certainly GPSG had a lasting influence on me, too, both in its call to try to describe fragments very exactly and to use very precise methods, things that weren't all that common at the time. To some extent, I have also been influenced by the work in AI. I think the collaboration between AI and computational linguistics has been very fruitful. On the one hand there is the work in formalisms, comparing the AI sorts of formalisms like KLONE or the other knowledge representation languages to the feature description languages like PATR II and its descendants that we use for writing grammars. This, I think, has also been very productive.

And on the other hand, and my interest in this goes back to my student days, there is the ``scruffier'' sort of AI. This is interesting not so much for its thoeries, which have not amounted to much, but people like Charniak, Schank, Wilensky and others showed very convincingly that you cannot really model natural language understanding without a great deal of real world knowledge, including common sense knowledge, and background knowledge, knowledge of tasks, knowledge of interlocutors, very difficult things. So they've left a legacy of interesting observations and analytical problems, most significantly the problem of how we resolve ambiguous utterances. And here linguistics, as a discipline, has little to say. It is almost built into linguistic methodolgy that it continuously exacerbates the problem - correctly.

I do not think that we're in a position to attack those with a great deal of depth even now, twenty years later. For very limited domains we can provide models that, well, could fool people right now. But only in very limited domains... things like finding out when the next train or plane to city X is scheduled. General systems are still a long way off.

That is not what I learned when I studied computer science. Where I studied there was a chair for Artificial Intelligence, held by Chandrasekran. He works on higher-level knowledge representations, and he has developed and demonstrated the value of specialized theories, for example, of time or devices, in knowledge representation. I was familiar with that work but Chandra was also deeply Schankian in his attitude towards natural language processing. That is not a very interesting attitude for a linguist in NLP - for Shankians all the action is in knowledge representation. I think they are right about natural language understanding needing to be knowledge-based at some level, but they were not attacking linguistic problems in ways that seemed very rewarding, theoretically.

When I studied computer science I studied primarily formal foundations of computer science, foundations that we are using in linguistics as well, these days. Formal language theory, parsing, compiler construction... things like that.

Experience

Before I came to Groningen I was an instructor, at Ohio State University, for a year after finishing. Then I went to Hewlett- Packard Labs in Palo Alto, where I was for five years. That was a very fruitful place for me to work at the time. There were some good people there, the lab was dominated by applications builders, but there were some very serious scientists, too. HP Labs was a very open-minded place, interested in looking at ideas, no matter where they came from. I think that was one of the nicest things there, no one ever said, `Oh well, that's some screwy idea from psychology, to hell with it,' for example. They always went about things like, `Well, what can we do with that? Does it work to explain what it's supposed to? And if it does can we take advantage of it in an application? And if we did, what would be the practical consequences of that?'

After leaving HP labs, in 1990, I went to the DFKI, the German AI center in Saarbruecken, where I spent three years before I came here. That again was a very interesting place to work. It is mostly of the `neat' school of AI, as opposed to the `scruffier' kind of AI I learned as a graduate student. The topics at the AI center that are most popular are knowledge representation, theorem proving, constraint logic programming, and a fairly formal approach to natural language processing. I tried to contribute there by emphasizing how to apply these methods to problems that we could realistically attack, and provide useful solutions to. And we looked for applications (though with nothing like the intensity common at HP Labs) that might conceivably develop into useful products, not in the short run, but, say, in the next five years or so.

The key in looking for applications is to be useful, even if you are going to be less than one hundred percent accurate in processing. Our theories are not up to one hundred percent accuracy, and that's ok, you can do a lot with less than one hundred percent, you just have to keep it in mind. That is an awfully important point in developing applications right now: our theories are not up to one hundred percent accuracy.

Teaching?

My positions have been in industry, including the DFKI, which is an industrially sponsored research institute. I also taught at Stanford, while I was with HP labs, which was affiliated with CSLI then. I taught natural language processing, and a computational semantics course. And then in Saarbruecken I also taught, especially on theory of lexicon, and again some semantics and pragmatics courses, for example, a course on ellipsis, and the processing of fragments. So I've maintained my interest in pure research. I also led groups of students in practical projects. That was actually a very nice aspect of the Saarbruecken program in computer science. After their first exam (about two years into the program), the students do a practical, six month project. That was an interesting thing to become involved in. It'd be a group of four to five students who would cooperate in a project. And they were already really quite good after two years of theoretical and practical work. So I worked with one group a bit on a set of tools for computational semantics. So over the years I haven't done as much teaching as lots of new professors; but it is not completely new to me.

The challenge of Groningen

In Groningen I think that the challenge is to link up what I think is really an outstanding linguistics department - one of the best in Europe, and certainly the best locally - with the computational linguistics program that we have and also with the more general interests in humanities computing. The linguistics side is what interests you most, so let's talk about that first. Jan Koster, Frans Zwarts, Jaak Hoeksema, Sjaak de Mey, Werner Abraham and others make up a great linguistics group, and I'm looking forward to interacting and working with them. The role of computational linguistics here is on the one hand complementary: we fill in gaps in the linguistic study of language, much like, say, psycholinguistics or sociolinguistics. But on the other hand we have an opportunity to interact with all the subdisciplines in linguistics first by building computational models, and this is certainly going on all over the place. Gosse Bouma's work here is perhaps the best local example, and it is particularly interesting to me, because it is related to work I've done on the lexicon. I had followed Gosse's work for some time before coming here.

And second, as I hinted above, I think many applications of linguistics become interesting when computers are available. But let me say some more about the linguistics side, first.

I certainly hope to continue my research in computational linguistics, and I hope at least some of it will engage other Groningen researchers. There is the work I have already done in computational semantics, for example, the work done together with Joachim Laubsch and Kader Diagne on a semantic representation language called NLL. NLL comprises a reasonable set of tools; it comes with a reader, a printer, a number of tools for building interfaces, things a lot of other languages do not offer. It already has been hooked up to three different syntactic front-ends, and two different application back-ends. I would like to tie it up into a neater package and put it in the public domain some time next year. I think it could support experiments in linguistic semantics.

I think there's room for quite a bit of collaboration with the other people around here. Groningen is really very fortunate in having two of the best young computational linguists around right now. I have already mentioned Gosse, but there is also Gertjan Vannoord, who has just recently received his PhD in Utrecht. With him I share an interest in declarative linguistic formalisms and their processing. He's done work on semantics, too.

But computational linguistics in Groningen is housed in humanities computing (alfa informatica), and this all by itself suggests lots of interesting possibilities and applications, especially ones connected with text processing in one way or another. Here I think again of interests in the lexicon and also the practical problems of accessing lexical information, in morphological processing. That is a very interesting area to look at if one wants to link up to the humanities computing in general. Humanities computing in general largely comprises text processing, and to a smaller extent, use of other computational tools, for example, database modeling, and information processing of different sorts. But I think text processing is the heart and soul of humanities computing. Linking computational linguistics up to those interests is something I would like to do.

We also have a specialist in text processing in Harry Gaylord. He is much closer to applications than either Gosse or I have been. He is involved in defining Unicode, the standard digital rendering of all alphabets, and has also served on the Text Encoding Initiative, and has pioneered the use of Standard Graphic Markup Language (SGML) for text processing. These are areas that could potentially benefit from work in computational linguistics. After all, if you are interested in text processing you are constantly faced with problems of lemmatization, and this is normally a problem of morphological analysis. Or, to take an example immediately relevance to the SGML work, you are faced with problems of looking at frequency within text units. Those units have to be defined somehow by the experts on the text - and that is what SGML is really all about: finding those units and have easy access to them. There are real opportunities for realizing that kind of work here.

And then our team in Groningen is rounded off by George Welling, who represents a part of humanities computing that often does not get the exposure it deserves. George is working on applications in history, and economic modeling. It is not something people would associate immediately with humanities computing, but it is a potential area of collaboration that we'll want to keep an eye on.

Immediate Plans for Teaching

In the next trimester, I will teach an HPSG course here. HPSG is something I have been following for the past nine years or so, since doing my PhD. In fact, one of the things I did in my dissertation was to show how you could use some categorial grammar methods in GPSG, and that is a familiar theme to HPSG as well. I will also teach a course on morphology and the lexicon sometime in the near future. As I said, I will try to keep up interests in computational semantics. I also am a fan of at least a reasonable foundation in traditional computer science, so some part of the course I teach here will have to do with basic methods in computer science. That may sound like a strange thing to do in a department like this, but the computer is our tool. For example, compiler technology is very useful in lots of areas. One of the most useful things you can do when you try to provide a user interface is to limit the kinds of inputs that you take, and the technology for doing that is to define a programming language - there are good tools to do that - and then a user can only say what you let him say. It is a kind of ergonomics, I guess.

Industrial connections

I think there is a limited opportunity for making use of my industrial connections as a professor here. Certainly, there are opportunities for cooperation with the research labs I've had close contact with, on a small level. But let's guard against some misconceptions about industrial sponsorship of research. Some people think that they just jump in and sponsor huge research projects, and that is really not too frequently, if ever, the case. If a company takes a very serious interest in some research, they do not want it to be at a university - so far from home; they want to have it in house, in the company somewhere. What we more realistically can do at the university is to be a kind of vanguard for research that is just a little bit too risky for industrial research labs.

Once an industrial lab has existed for more than, say, five years or so, it generally has a very good idea of how long a typical project should last, what percentage of their projects should finish in usable products and should make money for the company, how those projects should be monitored and evaluated, etc.; and those are tasks we do not want to take on at the university. At the same time the cooperation with industry is an interesting thing to pursue. One of the things that I have learned from working in the industry is that it really is possible, if one applies energy and intelligence to the problem of seeking appropriate applications, to come up with practical problems whose solution is on the one hand theoretically interesting, and on the other hand could also bring us closer to applications that we do not have now. What has always impressed me about good industrial research is that they spend a lot of energy thinking about the next project and the next kind of application they are going to look at. And they ask a lot of hard questions, like: `Suppose we don't solve the problem, can we still have something usable at the end? And if we can't solve this problem in any one instance, are the costs of not solving it at that instance so high that the application is simply risky?' For instance, Hewlett-Packard is now primarily a computer manufacturer, some sixty percent of its sales are in computers. But its profits still come largely from instrumentation, especially in medical, electrical and chemical engineering. So we had a fair amount of interaction from these groups at HP labs; they had a lot of money to sponsor research. We had one group once that wanted a natural language front-end to an intensive care unit, in a hospital. That was something, of course, we backed away from! laughs Not because it was an impossible application in the sense that we would have done much worse on it than in other areas - in fact, because they were asking a very limited number of questions, we would have done fairly well - but the cost of misanalysing in that domain is obviously not one that anyone will be willing to risk. Everyone knows that is not a position we are in right now.

OK, but what can we do with industry? A key here again is text processing, something that we represent in this department both through traditional text processing - the SGML work, Hypertext, and maybe the Unicode work - as well as through the computational linguistics work, which has more to do with the processing aspects of these, such as the morphology, the syntax and the semantics. Some of the applications that could crop up in text processing in the next several years, beyond the usual ones like better spelling checkers, include indexing tools. And.., obviously, if one can index well, one can provide information retrieval fairly well.

Another possibility might be in connection with Internet services, to find documents on the Internet. Tools like Archie or Gopher are very interesting, but they are also rather primitive. You can basically find packages only with specific names. But more and more disciplines are providing FTP archival services of important journals and papers, so an interesting tool would be to find some of those things, even if it were just improving to the level of seeking combinations of keywords. Of course, the interesting step would be where computational linguistics comes in - because these keywords are not simply strings, but they are in a combination that also makes a grammatical sense.

Good news for CL-students

The industrial point of view can be quite valuable. It is essentially a point of view that is interested in what kinds of applications can be created now. And if it involves academics with research training, then the industrial point of view typically leads to two seperate questions, namely: What kind of research questions can we put forward? and, What kind of practical work can we make possible? I think that is a valuable point of view to take, and it is one I hope the flavor of which I can bring across in teaching as well. This really is a very good question to ask, because, one of the things that will make our field interesting in the next ten years or so is that applications are going to arise, much more frequently than they have in the past. Speech technology, for example, has matured to the point where interesting applications exist now, in laboratories, and are being demonstrated to the `random' public convincingly. I recently took a look at the BBN ATIS speech system. I asked it approximately fifteen questions, of which thirteen were understood perfectly, and all of those were answered correctly. The ones that were not answered correctly involved on the one hand a deliberate stuttering on my part, just to see how the system would react, and on the other, my asking about a part of the database that did not exist. So I cannot really blame it for that.

This is very good news for students in computational linguistics because it means that there will be room for applying the techniques that you are learning right now. We should also make sure that we continue to be involved in these applications, because they simply are going to arise. The speech technology is good enough to support a good number of applications.

But the integrated speech work also illustrates a danger for the linguistically oriented CL. The question is, will these systems be linguistically oriented or not. Some of the systems one sees in labs today are of the linguistically oriented variety - I'd put BBN in that camp - but others are of the I'll-work-real-hard-on-this-domain-until-I-get-it-to-work variety. It may seem quite amazing that the non-linguistically oriented systems are working quite well, too, but remember that the domains are very restricted. It would be a shame to see the first generation of integrated speech-nl systems arrive in the market place with essentially no contribution from linguistics. That's the danger. Anyway, I think it is important to stay in touch with these fields, and to keep having effect on the development of the applications.

As for the department... Well, I have already mentioned how I hope to fit in here. The opportunities to work with a great linguistics department are very exciting. The core of the department that is already here looks like a very good group to work with. Also, one of my goals is that I hope to attract some industrial support of research in Groningen, probably in some of the areas I talked about... in morphology, or lexicon. Those are the ones I see tying in most nicely with the interests of the department as a whole. The other more ambitious applications like natural language understanding applications would be harder to locate here without an awful lot of other support, but perhaps that will be an opportunity for collaboration with other groups.

Research

My dissertation was an investigation of German temporal semantics. My proposal was to interpret Reichenbach very directly and to construe his famous trinity - speech, reference and event times - as parameters in a model theory. It started out as a typical sort of semantics dissertation with a Categorial Grammar fragment. But I dropped the CG stuff because GPSG was so interesting at the time, and I spent a lot of time developing GPSG as the syntactic carrier of the semantics. It was the first sizable fragment of German in GPSG.

More recent research concentrates on finding aspects of linguistics that one can apply profitably in computational linguistics. For example, showing how to implement GPSG was one of the main emphases of the Hewlett-Packard project. There were outstanding people working on that project, like Dan Flickinger, Carl Pollard, Mark Gawron, Bill Ladusaw and Tom Wasow. We worked on the feature system, a parser, and the semantics.

Of course this isn't a one-way street. Besides importing from linguistics, we also tried to create new linguistics where we found nothing applicable. One of the things we tried to do was to create representational systems for very complicated lexical information. Both GPSG and more so HPSG make very strong demands on lexical representation. What's a measurement of ``strong demands''? Well, I try to illustrate that to the computer science folks that are not as aware of the linguistic complexities by noting that GPSG systems include some forty or fifty different features to distinguish lexical items, each of which can have up to ten values - and the number is much greater in HPSG and LFG where features take other feature structures as values. On the face of it, then, you are looking at, well, what, 10^30 or so different combinations. From an engineering point of view, GPSG and HPSG are therefore wildly overdetermined. There is a lot more information being represented than really has to be represented (in the fragments). But that is a problem that is inherent in the stage our field is at right now. We have not pared things down} to an elegant or sparse enough level. Still, the best theories require this kind of complexity in lexical representation.

A big emphasis in our work at HP was on using inheritance methods in the lexicon in order to make the management of all this information feasible, especially for the human developer of the theory. This is very similar to work we find in knowledge representation, especially in languages like KLONE, where they use structured concept hierarchies to represent world knowledge. It is also similar to the work that is done in object-oriented programming, which also uses an inheritance hierarchy. Our proposal for lexicon structure was similar, and my contribution was in trying to see how much of the grammatical information you had could fit into this sort of format. I also contributed in asking questions about how we could represent other lexical information, for example, things like inflectional and derivational relationships. We wanted somehow to refer to a class of verbs and to subclasses of transitive verbs and intransitive verbs. But what if you have a rule that maps transitive verbs into intransitive verbs, like passive or nonspecific object deletion or absolute forms, how could you represent that in a hierarchical structure? In the course of the years we developed two answers to that. On the one hand, one might view these as morphisms on the lexicon, so that entire subhierarchies are interpreted as related in ways which preserve inheritance relations. Or, on the other hand, in more recent work, we have been trying to use feature hierarchies directly to represent the other relationships. The morphism approach adds apparatus while the feature-based approach tries to exploit existing structure, so it's clear that the latter is more immediately interesting. Within this approach, there are different hypotheses about what the inflectional relationships might be. One is, for example, that a lexeme is simply an abstract specification from which all the elements of the paradigm inherit. You have a single Dutch lexeme, say, werken. It is underspecified for person and number, and its further specifications could all inherit from that. Other possibilities make use of relational constraints, and there are still others.

These purely inheritance-based techniques work for inflection very well, but they do not work so readily for derivational relationships and compounding, real word formation. You can see this most readily in the possibility that these relations can be recursive. It is a rare lexical rule that applies to its own output, but the fact that it is possible at all is an indication that this method of making use of inheritance to represent inflectional relationships is not going to work. What we have proposed is to apply the same techniques that we use in HPSG-syntax to the theory of word formation. We have schematic word formation rules that are all defined in this feature description language HPSG likes to use, and it gives us a reasonable model of some of the properties of derivational relationships. Yet, I think we are really only at the beginning of this. By the way, this is an area where corpus studies will become the standard source of evidence in the very near future.

The most recent research that I've undertake involves problems with lexical access to the structed lexicon. It is one thing to model linguistic relationships and to show what representation of the relationships is like, but then one comes right back to the problem of accessing them. So, when I have a complicated word like `derivability', with several affixes involved, and I'd like to derive the properties of those from the lexicon without specifically having to list the derived words, how can I do that? What we need for that, of course, is parsing. How can I parse morphological input in such a way that I am given a pointer to the lexical class? That problem is basically solved for the more primitive lexical models, at least for simple morphological theories, and we are trying to generalize these to the structured lexicon models. I'll make a first report on this at the ACL meeting in June in a paper with Uli Krieger and Hannes Pirker.

Another area that I am interested in is the computational semantics area. Earlier I mentioned this NLL language that we'd like to put in the public domain sometime in 1993. The most interesting thing about the NLL work is not so much the semantic representation language itself. It is quite respectable - a variant of a generalized quantifier language with some things added that we found interesting in database access; and there is a theory of plural reference and plural predication, and also for example variable-binding term-forming operators. I mention all that just to stress that it is a respectable theoretical citizen.

But what is most interesting about NLL is that we have tried to define it with an eye towards good computer science, and towards that which may follow after it. NLL will not be the last word in semantic representation languages. There are great new theories that we are seeing - lots of them coming out of Holland, actually - and these will be able to deal with much more difficult problems than NLL can. But, because NLL is built on a fairly solid programming language technology, it will allow us to get a jump start in implementing some of those languages. We have used standard parser generating tools, a LISP-version of yacc is one of the things that is at the basis of it, and we have used some tree-transformation techniques that are also well-known in compiler technology to provide interfaces. Some of these parts of NLL are worthy of special attention because they make the implementation of semantic representation so much easier. I hope to put semantic representation on the same footing that syntactic analysis is on right now. Almost anyone with some basic training in declarative syntactic formalisms can implement his own theory of syntax and test it out on the computer. You can change HPSG up to a certain point and then come up with a new theory about specifiers or adjuncts or something like that, something that would have been unthinkable five years ago. We are no way near that stage in semantics now and it would be foolish to predict that we will be there in five years. But the development of tools can certainly make experimentation a lot easier, it enables us to do a lot of practical work at a lower cost.

We can learn from the way Computer Scientists do research, namely with an eye to both practical and theoretical aspects of work. That is certainly something that at least opportunistically at these hard economic times we should learn. It will be the tendency for the years to come.''

(Erik Oltmans & Bert Plat)

Naar de index

Naar het Ta!-thuismenu