Computational Linguistics
in the Netherlands 1996

the Seventh CLIN Meeting

Friday, November 15th, 1996
IPO, Center for Research on User-System Interaction, Eindhoven

We are pleased to be able to say that the seventh CLIN meeting was a succes. We would like to thank anyone that attended the meeting for their contributions, which helped make it a fruitful and successful conference. The meeting was held last year and was organised by IPO in Eindhoven. IPO was then the Institute for Perception Research, but has recently changed its name to IPO, centre for research on user-system interaction.

A compilation of a selection of the papers presented at CLIN 1996 will be made. It will be issued at the next CLIN meeting in 1997.

At CLIN meetings, computational linguistics researchers in the Netherlands and Dutch-speaking Belgium gather and present their research. The meeting is also open to international participation. The default language of the conference is continental English. However, presentations with a Dutch title and abstract may be held in Dutch.

In 1996, Stephen Pulman of SRI International in Cambridge and the University of Cambridge Computer Laboratory presented a keynote lecture on Conversational Games, Belief Revision, and Commitment.

The local organisers,

Jan Landsbergen,

Jan Odijk,

Kees van Deemter and

Gert Veldhuijzen van Zanten

The proceedings are electronically available.

Abstracts of the talks

keynote: Conversational Games, Belief Revision, and Commitment.
Stephen Pulman
Cambridge
[dvi, ps]
This talk discusses the roles of these three concepts in some recent approaches to dialogue and tries to sketch a hybrid rule-based+statistical framework on which practical implementations could be based.
A Data-Oriented Approach to Lexical-Functional Grammar
Rens Bod, Ronald Kaplan, Remko Scha & Khalil Sima'an
Amsterdam, Palo Alto/Wassenaar, Utrecht
The data-oriented approach to language processing assumes that previous language experiences (rather than abstract linguistic rules) form the basis for language perception and production. A Data-Oriented Processing model (DOP model) therefore maintains a large corpus of linguistic representations of previously occurring utterances. By combining fragments from this corpus, representations for new sentences can be generated. The frequencies of these fragments are used to estimate the most probable representation of a given utterance.
A DOP model can be defined for almost every theory of linguistic representation or utterance analysis. The original DOP model corresponds to a theory in which the linguistic representation of an utterance is given by a phrase structure tree. New representations are produced by combining subtrees of previous representations. In this talk, we show how a DOP model can be developed for the more articulated representations provided by Lexical Functional Grammar (LFG). On this theory of representation, the analysis of every utterance consists of a constituent-structure (a phrase structure tree), a functional-structure (an attribute-value matrix), and a correspondence function that maps between them. We will show how the definitions for fragments and combination-operations of original DOP can be straightforwardly extended to a DOP model based on LFG representations. However, the original DOP probability calculations do not properly apply to LFG's nonmonotonic constraints on valid fragment combinations. We propose a new probability model that does generalize appropriately to the case of nonmonotonic conditions, and describe how this model applies to LFG representations.
Modularity in Machine-Learned Word Phonemisation Systems
Antal van den Bosch
Maastricht
Word phonemisation, the task of converting a word to its phonemic transcription (with word stress), is hard for two reasons. First, it involves a large amount of language-dependent knowledge hard to acquire by handcrafting; however, this task may be alleviated using inductive-learning algorithms to automatically induce the knowledge needed. Second, the task represents a non-linear classification task which is hard to (learn to) represent in a single-process system. Designing a modular system in which the task is solved in more than one step appears a good heuristic. However, modularisation may induce unwanted effects in performance: e.g., (i) many proposed orderings and separations of subtasks ignore relevant dependencies between subtasks; (ii) modular systems are relatively sensitive to cascading `snowball' errors. The paper provides empirical performance data obtained by systematically varying and optimising the number and the ordering of modules in a word phonemisation system. Individual modules are automatically induced on the basis of a large lexical data base of English, by symbolic (IGTree, IB1) or connectionist (Back-propagation) inductive learning algorithms. The results point out that both numbers and orderings of modules considerably affect generalisation performance. The results offer insight into subtask dependencies in morpho-phonology, and as a spin-off, provide indications for building accurate word phonemisation systems.
HPSG without Lexical Rules
Gosse Bouma
Groningen
[dvi, ps]
Recent work in HPSG has (1) emphasized the role of argument structure (ARG-ST) as a level of representation independent of valency, (2) demonstrated that (recursive) constraints on lexical entries may lead to accounts of unbounded dependency constructions and quantifier scope that are superior to previous proposals, and (3) used lexical rules to describe processes that were previously considered to be primarily syntactic in nature (extraction, selection of adjuncts, and clitization).
In this talk I want to argue that a combination of (1) and (2) makes (3) superfluous. In particular, lexical rules for complement extraction, introduction of adjuncts on COMPS, and clitization can be replaced by monotonic constraints on lexical entries which define the relationship between subcategorization and argument structure, between argument structure and valency (including SLASH), and between argument structure and phonology.
The advantages of this approach are that the `canonical' mapping between argument structure and valency not just holds for basic lexical entries, but for all lexical entries, that the various forms of an entry previously derived by means of lexical rules can be seen as monotonic instantiations of a single basic entry, and that many of the problems associated with the use of lexical rules (order sensitivity, default relations between input and output, spurious derivations, recomputation of values on the output) disappear.
On selecting WH-chains
Crit Cremers & Maarten Hijzelendoorn
Leiden
[dvi, ps]
Wh-chains are known to be restricted by both local and global requirements. In order to parse wh-relations efficiently, we have to account for the interaction of the local and the global licensing mechanisms.
The parser Delilah, which handles Dutch categorially and context-sensitively, is equipped with a deterministic and incremental device for selecting wh-chains. By this device - a finite state network - the number of possible operator-variable relations which the parser has to check, is reduced as much as possible in an incremental and deterministic fashion. The network operates under the assumption of massive lexical ambiguity with respect to the local licensing of variables. It is fed by knowledge of local and global conditions on wh-chains.
For non-coordinated sentences the device may divide the number of possible operator-variable chains by twenty, and for coordinated sentences, by six - allowing in the latter case for across-the-board applications. We will present a detailed account of the grammatical and operational aspects of the network, and some figures as to its effect.
Memory-Based Prepositional Phrase Attachment
Walter Daelemans, Jakub Zavrel & Peter Berck
Tilburg
[dvi, ps]
Syntactic analysis can be seen as a cascade of classification problems of two types: segmentation (constituent boundary detection) and disambiguation (morphosyntactic disambiguation, constituent labeling, and attachment decisions). By rephrasing syntactic analysis as a series of instances of a classification problem, machine learning techniques such as decision tree learning and memory-based learning become applicable. When using annotated example corpora (treebanks) as learning material, these machine learning techniques can generalize the knowledge implicit in the annotations to unseen text. Obvious advantages of this approach include automatic learning (alleviating knowledge acquisition bottlenecks) and robustness due to the statistical nature of the learning algorithms.
In previous work, we have applied memory-based learning techniques to segmentation and disambiguation problems in phonology (syllabification, stress assignment, grapheme disambiguation), morphology (analysis and synthesis), and morphosyntax (morphosyntactic disambiguation, i.e. part of speech tagging). In this paper we show that a benchmark phrase attachment problem (PP-attachment), can be learned using memory-based learning techniques. Advantages of the approach to existing stochastic techniques include (i) smooth automatic integration of knowledge sources, (ii) non-parametricity (no parameter estimation needed). We also discuss the impact on generalization accuracy of different similarity metrics in the memory-based learning algorithm and of different input representations.
An Interface between Text Structure and Linguistic Description
Thierry Declerck
Saarbrücken
The Interface between Text Structure and Linguistic Description within the ALEP platform.
Input to the ALEP system is automatically converted into a SGML marked text, which will be the input to the linguistic processing. For the analysis of those tagged texts, some tsls rules (Text Structure to Linguistic Structure) have to be defined. So, if an item is tagged as a word (tag `W'), an obligatory tsls rule should define which kind of linguistic object (described in the grammar) will apply to this item. This allows a substantial modularization of the grammar, specifying which kind of linguistic rules will apply.
Beside the SGML tags, we used the system-defined tag `USR' in order to deal with fixed phrases and `messy details'. User-defined (multiple-)word recognizers have been integrated into the text-handling component of ALEP. The tagged output of these programs gives the input for the tsls rules. We described generic lexicon entries (i.e. `dates', etc.) corresponding to the `USR'-tagged expressions. With this technique, the running time of the parser has been significantly improved and the coverage of the grammar considerably extended.
The last step of our work has consisted in the extension of the set of tags defined within the ALEP system. So, for example, a tag `CAT' has been added. This allows us to integrate information delivered by a Part of Speech tagger. We extracted the PoS information and `lifted' it to the linguistic description via the tsls rules. This again leads to very substantial improvement in term of efficiency of the parser and of coverage of the grammar.
And also a more theoretical question arises: can this strategy provide a practicable way for combining corpus-based and knowledge-based approaches to NLP? In any case, we will have to consider the reorganization of the (unification-based) grammar description with respect to the possibility of extracting morpho-syntactical information from PoS taggers.
Minor Categories
Frank Van Eynde
Leuven
[dvi, ps]
In GPSG and HPSG the distinction between elements with and without phrasal projection is drawn in terms of speech parts, cf. the major V, N, A, P vs. the minor Comp, Conj, etc. Contrary to this practice I claim that the major/minor distinction had better be treated as orthogonal to the speech part classification.
To substantiate this claim I will show that the distinction between full and reduced personal pronouns in Dutch (jij/je, zij/ze, ...) is an instance of the major/minor dichotomy. Next, I will spell out an HPSG style sort hierarchy for the description of minor signs and explore their syntactic peculiarities, i.e. the impossibility to be used as heads, fillers or conjuncts, and the deviance from the LP constraints which hold for their major counterparts. Criteria will be provided for identifying minor signs in other speech parts and in other languages.
Since the minor elements behave differently from the major ones, both in terms of constituency and linear order, the distinction had better be made explicit in the grammar. This argues against the GB policy to assign phrasal projections to all lexical elements (and to many affixes), as well as against a trend in HPSG to treat all lexical signs (incl. the complementizers) as heads.
Conjunction is Commutative
Bart Geurts
Osnabrück
There was a time when this would have been needless to say, but times have changed. Groenendijk & Stokhof define dynamic semantics as follows:
A semantics is dynamic if and only if its notion of conjunction is dynamic, and hence non-commutative.
In this paper I argue that dynamic semantics, thus understood, is a rather bad idea. Dynamic semantics is an admittedly elegant but nonetheless misguided implementation of an essentially pragmatic principle. It is an obvious and even important truth that utterances are processed incrementally. The central tenet of dynamic semantics is that, to some extent at least, this processing strategy is encoded in the lexical entries of certain words, and especially in the lexical meaning of 'and'. Thus formulated, it will be plain that the very notion of a dynamic semantics is quite implausible. But apart from its lack of plausibility, it gives rise to all sorts of strange quandaries. Consider, for example, a young child learning the meaning of 'and'. Are we to suppose that he learns it in two steps? The truth-conditional part first, perhaps, and the dynamic part afterwards - or would it be the other way round? Would it be possible for a child to get the truth-conditional import of 'and' right but founder on its dynamic aspects? Clearly, such questions are absurd: the lexical meaning of 'and' isn't dynamic.
In my talk I will first elaborate on this point and then turn to proposals for giving dynamic interpretations to negation and disjunction as well. I will argue that these, too, are ill-founded empirically as well as conceptually.
Two Perspectives on Reusability of Lexical Resources
Pius ten Hacken
Basel
It is generally accepted nowadays that the scarcity of lexical resources in NLP necessitates a kind of reusability. At least two approaches to reusability can be distinguished, resulting in different domains of what is reused. In one approach the lexicon is a purely declarative knowledge base, containing all information to be used by NLP-systems. Reusable information includes what is encoded in features. System-specific information includes all procedural knowledge. In the other approach, reusable information is everything that is necessary for the mapping between text words and lexemes in the dictionary. This includes both declarative and procedural knowledge on morphology. In this approach system- specific information encompasses syntax and semantics.
A typical example of the first approach is DATR. The second approach is not represented adequately by two-level morphology, which lacks the notion of lexeme. A better representative is Word Manager, a system developed in Basel. I will argue that this approach to reusability has a number of important advantages compared to the one represented by DATR.
Polynomial Machine Translation: Handling Noncompositionality Compositionally
Willem-Olaf Huijsen
Utrecht
Translation idioms and structural divergencies between languages are classical problems for machine translation. This holds in particular for compositional approaches, which require a translation-equivalence between basic expressions and between grammar rules of source-language and target-language grammar. One way to attack these problems, pursued in the Rosetta system, is to make use of grammar rules that can perform syntactically powerful operations, enabling a distinction between surface structure and compositional derivation structure.
In this talk I present a formal basis for an alternative approach in which the individual grammars can be relatively simple (e.g. context-free or DCG), but where the translation relation between the grammars is more complex. Translation-equivalence is now defined as a relation between combinations of rules and basic expressions, so-called polynomials. Special attention is paid to the issue of completeness, i.e. to the conditions under which this translation method guarantees to yield at least one translation for each analysis of all source-language expressions.
Speech output in GoalGetter
Esther Klabbers
Eindhoven
[dvi, ps]
In this talk I will give an overview of the GoalGetter system. This system generates spoken summaries of football matches on the basis of concise teletext information. The system consists of a language generation component and a speech output generation component. The language generation component will be discussed in more detail in the presentation by Mariet Theune.
The focus of this presentation will be on the speech output module. Speech output can be realised by either diphone synthesis or phrase concatenation. With diphone synthesis one can generate an unlimited set of sentences. Phrase concatenation is used in applications where the set of sentences is limited. Entire words and phrases are recorded and can be strung together to construct the spoken texts without any manipulations on the original recordings. Our approach to phrase concatenation is special in that we record variable words, like team names and player names, in several prosodic contexts. Dependent on the place where the variable is to be inserted in a carrier sentence and information about accenting and phrasing, the right prosodic variant is selected.
An Automated Semantic Analytic Method with Application to Simple Natural Language Database Queries
Gregers Koch
Copenhagen
For application in connection with databases and in particular information systems like library systems, we shall analyze a few prototypical natural language queries. The query analysis recommended here is essentially automated and uses logic programming as a tool for analysis of natural language semantics, and it involves modelling the information content by means of a logical representation. It comprises the extensive application of induction using some homemade inductive meta systems that perform automated program synthesis through, as an intermediate step, some dataflow analysis resulting in the construction of some so-called dataflow structures (cf. Understanding & Logic Prog.2-3). The resulting synthesized programs are logic grammars, more precisely definite clause grammars (DCG). The method seems very promising.
As an illustration, we intend to examine a simple and prototypical query to a library information system

"print the title and author of a book on circus horses".

and a few prototypical database queries, for instance some queries to the HVFC (cf. J.E.Ullman):

"print the name and address of any customer with negative balance".

"print the name of any supplier who supplies each item ordered by Brooks".

Clitic Climbing without Argument Composition
Dimitra Kolliakou
Groningen/Newcastle upon Tyne
[dvi, ps]

Complement clitics in Modern Greek NPs exhibit an idiosyncratic type of climbing: they can attach on the noun head (1), prenominal adjectives (2), and a small set of left periphery elements (3). Though such clitics were taken to be affixes in previous approaches (e.g. Stavrou and Horrocks 1990), they do not satisfy various of the diagnostics that have been proposed to characterize Pronominal Affixes and distinguish them from Postlexical Clitics (see e.g. Miller 1992). Moreover, an account of their positioning in terms of Argument Composition (Hinrichs and Nakazawa 1990, 1994; Miller and Sag 1996) would encounter serious difficulties including the contrast in (4) which indicates that an adjective with a complement of its own cannot ``attract'' the noun head's clitic complement. I provide an account of clitic climbing in MG NPs in terms of Domain Union (Reape 1994, Kathol 1995) and that employs a notion of Attachment in the sense of Dowty (to appear) and Gunji (to appear). This approach can be straightforwardly extended so as to account for definite articles and NP-internal demonstratives which along with clitics cannot stand on their own, but rather require an appropriate host to attach on.

1. to kenurio vivlio mu-CL (lit.: the new book my)

2. to kenurio tu-CL vivlio (lit.: the new his book)

3a. ola tus-CL ta vivlia (lit.: all their the books)

3b. afto su-CL to vivlio (lit.: this your the book)

4a. i [anagnorismeni [apo olus]] iperohi tu-CL (lit.: the acknowledged by all superiority of-his)

4b. * i [anagnorismeni tu [apo olus]] iperohi

Dutch Compounds and Information Retrieval
Wessel Kraaij & Renee Pohlmann
Delft, Utrecht
[dvi, ps]
We will describe research on the treatment of Dutch compounds in the UPLIFT information retrieval project. Results of earlier experiments in the UPLIFT project indicated that splitting up compounds in the query and generating new compounds by simply combining query terms both improved retrieval performance. We subsequently experimented with adding constraints to the compound splitting and generation algorithms in order to restrict both processes and minimize over-generation. We experimented with using information about head-modifier relationships and corpus frequency information to formulate constraints. So far, we have not been able to improve on our initial strategy but the results of initial experiments have provided us with some important clues for further experimentation.
Presuppositions as Anaphors, and vice versa;
Towards a Full Understanding of Partial Matches
Emiel Krahmer & Kees van Deemter
Eindhoven
[dvi, ps]
Rob van der Sandt's theory of `presuppositions as anaphors' is widely considered to be the empirically most adequate theory of presupposition projection on the market. In this talk, two weaknesses of Van der Sandt's theory are pointed out and remedied. The first weakness is the fact that a central notion of the theory, namely that of a `partial match', is not defined in a sufficiently precise way. The second weakness, in our opinion, is the fact that the theory takes only one kind of anaphora into account, in which anaphor and antecedent must always corefer. Both weaknesses are remedied in an updated version of the `presuppositions as anaphors' theory that we claim to be both more precise and more general than its predecessor.
Morphonological Aspect of Automatic Processing System of Texts (on materials of the Turkic languages)
Masud Mahmudov & Vugar Sultanov
Baku
The researches on creating the automatic processing system of the texts in Turkic languages shows that it is necessary to determine and take into consideration the morphonological regularities. The morphonological changes observed in the formal processing systems of Turkic texts in computers can be grouped as following:
1. the morphonological changes occuring in the word root;
2. the morphonological changes in the bound of the word root and word-building affixes;
3. the morphonological changes in the bound of the word root and word-changing affixes;
4. the morphonological changes in the bound of the word-building and affixes word-changing affixes;
5. the morphonological changes in the bound of the word-changing affixes.
Modelling coordination by operations on strings, CF trees and TAG trees
Carlos Martin-Vide, Rudolf Ortega-Robert & Gheorghe Paun
Tarragona, Bucuresti
[dvi, ps]
Several string operations are introduced, as models of the coordination phenomenon in natural languages. Their relationships with other string operations are investigated, then obtaining the closure properties of families in the Chomsky hierarchy. In particular, CF is not closed under these operations. However, if coordination is defined only between strings with a common syntactic structure (both strings have derivations described by identical trees, modulo the coordinated subwords), then coordination preserves the context-freeness. The extension of this tree-based coordination operation to TAG's is also discussed.
Questions, Answers and Context in Constructive Type Theory
Paul Piwek
Eindhoven
A definition of the notion of answerhood is formalised using a proof system, i.e., Constructive Type Theory. The definition, which was proposed in the mid-eighties by Jeroen Groenendijk and Martin Stokhof, makes use of two concepts which, in the past fifteen years, have become central to the trade of formal semantics: context change and context-dependence. The formalisation using CTT is proposed as an alternative for Groenendijk and Stokhof's original formalisation in possible-world semantics. It is demonstrated that CTT, and in particular the fact that CTT is a proof system, enables a more fine-grained analysis which can be turned into a computational model. Furthermore, we contend that our formalisation of the definition of answerhood is a natural generalisation of definitions of answerhood which are phrased in terms of unification of the question and the answer.
What Lexical Approach To Unbounded Dependencies Is Good For: HPSG Analysis of Verbal Negation in Polish
Adam Przepiorkowski & Anna Kupsc
Tuebingen, Warszawa
[dvi, ps]
In this paper we develop an HPSG analysis of certain (so far unnoticed) syntactic phenomena connected to verbal negation in Polish. First of all, we show that -- contrary to the received wisdom -- verbal negation is a morphological (rather than syntactic) process and we model this observation via lexical rules. Then we move to the so-called long distance negative concord, i.e., requirement that the verb has to be negated if any of its arguments is or contains a negative pronoun. We show that this is essentially a UDC as this `negation requirement' can cross arbitrary number of NP and PP boundaries. (VPs seem to be islands.) Since this `negation requirement' is discharged lexically (by negated verbs) and because of some intriguing lexical exceptions, we adapt the lexical approach to UDCs of Sag (1995) and Sag (1996). Finally, we investigate interesting behaviour of negative concord and of genitive of negation in the context of verb clusters, and show that this behaviour can be accounted for if arguments of the lower verbs are assumed to be raised to the nearest negated verb (if any), a la Hinrichs and Nakazawa (1989), and if case assignment and `negation percolation' are made sensitive to whether the argument has been realized from the given argument structure, or raised to higher verbs. In the latter we follow the non-configurational case assignment approach of Przepiorkowski (1996).
Information Update in Dutch Information Dialogues
Mieke Rats
Delft
[dvi, ps]
In my talk, I will give a corpus-based analysis of information update in information dialogues. The corpus used consists of 111 naturally ocurring telephone conversations recorded at the information service of Schiphol Airport. The information update will be described theoretically by extending the dynamic interpretation theory (DIT) of Bunt (Bunt 1995) with the information packaging notions "topic", "tail", and "focus" (Rats(1996), Vallduvi(1990)). The file change semantics of Heim will be used to show how the information update can be formalized. Examples and tables from the corpus will show how the information update is realized linguistically.
References:

[1]
Bunt, H. (1995) Dynamic Interpretation and Dialogue Theory in The Structure of Multimodal Dialogue 2, Taylor, M., Neel, F & Bouwhuis, D. (eds), John Benjamins Publishing Company.

[2]
Heim, I. (1983) File Change Semantics and the Familiarity Theory of Definiteness in Meaning, Use, and Interpretation of Language, Bauerle, R, Schwarze, C. & von Stechow, A. (eds), De Gruyter.

[3]
Rats, M. (1996) Topic Management in Information Dialogues Ph-D thesis Tilburg University.

[4]
Vallduvi, E. (1990) The Informational Component, Ph-D thesis University of Pennsylvania.
Natural Language Engineering: Science meets Business
Wilco G. ter Stal
Amsterdam
In this talk I want to present (1) a summary and the main conclusions of my Ph.D. thesis on the automated syntactic and semantic analysis of nominal compounds in a technical domain, (2) experiences concerning the practical applicability and the potential business opportunities of speech and language technologies from the perspective of a large IT-supplier: Getronics Software
The ANNO-corpus: written Dutch intended to be spoken
Bruno Tersago & Ineke Schuurman
Leuven
[dvi, ps]
The ANNO-project (An annotated public database for written Dutch; Flemish short-term programme for speech and language technology) intends to initiate the creation of a large database for the variant of Dutch used in Flanders, as there is no corpus of reasonable size available for Flemish Dutch.
BRTN-Dutch beingnconsidered to reflect the national standard, the corpus consists of news bulletins and issues of the current affairs programme Actueel (both BRTN-radio). Next to written texts intended to be spoken these contain transcriptions of interviews.
In this talk we want to report on the choice of the material and the consequences this had, the types of annotation we used for the whole corpus or just part of it, the way annotating was done ((semi-)automatically or by hand), and why it was done that way, as well as on our future plans.
GoalGetter: the accent IN language generation
Mariët Theune
Eindhoven
[dvi, ps]
In this talk I will discuss some aspects of the language generation component of the GoalGetter system. This system generates spoken summaries of football matches, based on teletext information.
The focus of the talk will be the accentuation of referring expressions in GoalGetter. Referring expressions play an important role in the football reports we generate, since we constantly have to refer to players and teams. First, I will briefly explain how the system generates different referring expressions depending on the context. Then I will discuss the accentuation rules we currently use: expressions referring to a 'new' object receive an accent, whereas expressions referring to a 'given' object do not. This approach is in line with many accentuation theories. However, it does not always give the correct result. I will argue that we need to add some notion of contrastive accent to our accentuation rules. A problem here is that the few existing contrast theories do not seem to be applicable to the football domain.
A Relational Approach to Trees and to Command Relations
Claude Del Vigna
Paris
Linguistics and Computer Science make an extensive use of tree structures. We present here a formalisation of trees (in fact, of forests) within the algebraic theory of binary relations (Del Vigna & Courrége, 1994) and we show how the relational framework also expresses the theory of command relations used in Generative Grammar (Del Vigna, 1996). In fact, this may be applied to various configurations in trees. The expressiveness, the simplicity and the elegance of relational algebra are widely recognized, particularly in the relational database model. More, as algebra, it allows blind calculus and proofs based on rewritings. These qualities still hold with syntagmatic structures and, in other respects, the relational approach provides an unifying frame for several definitions of trees which occur in the literature.
First, we introduce forests on a finite set N. Then, we defined a gridded forest as a pair (V,H) of forests on N. The definition is symmetric, i.e. the pair (H,V) is also a gridded forest on N. Four derived forms of gridded forests are presented: primitive, which corresponds to oriented and ordered trees (Aho & Ullman, 1972), functional, which corresponds to the data structure for binary trees used in programming, DP, which corresponds to the pair (dominance, precedence) in (Partee, Ter Meulen & Wall, 1990) and, finally, total. Algebraic formulae permit transition from any form to another and constitute a basic and usefull formal toolbox. Finally, we present the axioms, all expressed in relational algebra, which characterize, for a given forest, the set of its command relations.
Extractie van bijwerkingen uit medische teksten met behulp van computerlinguïstische technieken
Marc Weeber & Rein Vos
Groningen
In het uit te voeren promotie-onderzoek wordt getracht een bijwerkingenprofiel van een geneesmiddel automatisch te extraheren uit medische literatuur. Allereerst wordt er een zo volledig mogelijk profiel opgesteld. Daarnaast zullen ontwikkelingen in de tijd gevolgd worden. Een eerste begin is gemaakt door medische teksten te beschouwen als een corpus van losse woorden. Uit dit corpus kunnen subcorpora geïsoleerd worden. De resultaten van enkele vergelijking tussen subcorpora zullen gepresenteerd worden.
Door de reductie van tekst tot losse woorden gaat echter veel informatie verloren. Andere methoden om vaste structuren te ontdekken in de tekst zullen aangewend worden. Gedacht wordt o.a. aan collocaties, concept extraction en part-of-speech tagging.
De extractie van bijwerkingen is de basis voor twee onderzoekslijnen. De eerste lijn bouwt voort op de resultaten: bijwerkingen kunnen gebruikt worden in het vinden van nieuwe toepassingen voor bestaande medicijnen. De tweede lijn bouwt voort op de technieken. De technieken kunnen mogelijk gebruikt worden om een risicoprofiel op te stellen van een geneesmiddel. Computerlinguïstische analyse van medische literatuur zou bepaalde tendensen eerder kunnen signaleren dan in de huidige praktijk het geval is.

For more information:

CLIN 96
Jan Landsbergen
IPO, Center for Research on User-System Interaction
P.O. Box 513
5600 MB Eindhoven
fax: +31 (0)40 2773876
email: clin96@ipo.tue.nl

CLIN 96 was sponsored by:

Priority Programme
Language and Speech Technology (TST)

and

IPO

Center for Research
on User-System Interaction

click here

Back to the CLIN home page .

Last Updated: May 11, 1998

1.		to kenurio vivlio mu-CL	(lit.: the new book my)
2.		to kenurio tu-CL vivlio	(lit.: the new his book)
3a.		ola tus-CL ta vivlia	(lit.: all their the books)
3b.		afto su-CL to vivlio	(lit.: this your the book)
4a.		i [anagnorismeni [apo olus]] iperohi tu-CL	(lit.: the acknowledged by all superiority of-his)
4b.	*	i [anagnorismeni tu [apo olus]] iperohi

Computational Linguistics in the Netherlands 1996

Abstracts of the talks

Computational Linguistics
in the Netherlands 1996