Linguistic Distances

When: July 23, 2006
Where: Sydney, Australia
What: Workshop Program

Submissions for this workshop are closed.


In many theoretical and applied areas of computational linguistics researchers operate with a notion of linguistic distance or, conversely, linguistic similarity, which is the focus of the present workshop. While many CL areas make frequent use of such notions, it has received little focused attention, an honorable exception being Lebart & Rajman (2000).

In information retrieval (IR), also the focus of Lebart & Rajman's work, similarity is at heart of most techniques seeking an optimal match between query and document. Techniques in vector space models operationalize this via (weighted) cosine measures, but older tf/idf models were also arguably aiming at a notion of similarity.

Word sense disambiguation models often work with a notion of similarity among the contexts within which word (senses) appear, and MT identifies candidate lexical translation equivalents via a comparable measure of similarity. Many learning algorithms currently popular in CL, including not only supervised techniques such as memory- based learning (k-nn) and support-vector machines, but also unsupervised techniques such as Kohonen maps and clustering, rely essentially on measures of similarity for their processing.

Notions of similarity are often invoked in linguistic areas such as dialectology, historical linguistics, stylometry, second-language learning (as a measure of learners' proficiency), psycholinguistics (accounting for lexical "neighborhood" effects, where neighborhoods are defined by similarity) and even in theoretical linguistics (novel accounts of the phonological constraints on semitic roots).

The workshop aims to bring together researchers employing various measures of linguistic distance or similarity, including novel proposals, especially to demonstrate the importance of the abstract properties of such measures (validity, stability over corpus size, computability, fidelity to the mathematical distance axioms), but also to exchange information on how to analyze distance information further. We assume that there is a "hidden variable" in the similarity relation, so that we should always speak of similarity with respect to some property, and we suspect that there is such a plethora of measures in part because researchers are often inexplicit on this point. It will useful to tease the different notions apart. Finally, it is most intriguing if we might make a start on understanding how some of the different notions might construed as alternative realizations of a single abstract notion.

Lebart, L. & M. Rajman (2000) Computing Similarity. In R.Dale et al. (eds.) Handbook of NLP. Dekker: Basel.

Call for papers

Papers are invited on substantial, original, and unpublished research investigating linguistic distance measures, and their application, analysis and interpretation. The submission deadline is below.


Submissions should follow the two-column format of ACL proceedings and should not exceed eight (8) pages, including references. We strongly recommend the use of the LaTeX style files or Microsoft Word document template that will be made available on the conference Web site (see: here). We reserve the right to reject submissions that do not conform to these styles, including font size restrictions.

As reviewing will be blind, the paper should not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ...", should be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ...". Papers that do not conform to these requirements will be rejected without review.

Submission will be electronic. The only accepted format for submitted papers is Adobe PDF. The papers must be submitted no later than April 10, 2006. Papers submitted after that time will not be reviewed. For details of the submission procedure, please consult the submission webpage reachable via the conference website.

Questions regarding the submission procedure should be directed to the Program Co-Chairs, John Nerbonne and/or Erhard Hinrichs (,

Papers that are being submitted in parallel to other conferences or workshops must indicate this on the title page, as must papers that contain significant overlap with previously published work. Please use the abstract or the title footnote for noting these complications.

For LaTeX and Word Templates, see here

Important Dates

April 10, 2006 Submission Deadline
May 10, 2006 Notification of Acceptance
June 1, 2006 Final Papers to Organizers

Program Committee

John Nerbonne (Groningen) and Erhard Hinrichs (Tübingen) (chairs), Harald Baayen (Nijmegen), Walter Daelemans (Antwerp), Ido Dagan (Technion, Haifa), Wilbert Heeringa (Groningen), Ed Hovy (ISI, Los Angeles), Grzegorz Kondrak (Alberta), Sandra Kübler (Tübingen), Rada Mihalcea (North Texas), Ted Pedersen (Minnesota), Dan Roth (Illinois), Hinrich Schütze (Stuttgart), Junichi Tsuji (Tokyo), Menno van Zaanen (Macquarie, Sydney)