Bean Counting

Lots of us deal with the "publish or perish" demand.  Those working at
universities and research institutes are expected to publish
regularly, and larger companies are often pleased to see the output of
their research laboratories measured not only in sales of new product
lines but also in the number and quality of publications.

The requirement is reasonable.  First, research that isn't
communicated in some accessible channel hasn't added usefully to what
we know: it has to be published.  Second, even if the primary task of
universities is education, as it was until fifty years or so ago,
still educators need be intellectually active, and the requirement of
publication provides some objective reassurance to educational
administrators that their staffs aren't intellectual zombies.  Third,
and most fundamentally, publication is part of the scientific
dialectic: by publishing your findings, you're exposing them to the
criticism that will identify flaws, gaps, inconsistencies and
gratuitous assumptions.  The mistakes the next paper -- yours or mine
-- should remedy.  Publish!

One hears the complaint that too much is published -- that it's not
all worth reading, and it's true in some sense.  No one wants to
argue for lower standards.  But this just means that we ought to be
more selective about quality work, not give up publishing.  

The requirement to publish is operationalized more ambitiously, not
just as a demand to publish, but as a system of evaluation to measure
research quality via publication.  For someone engaged in the language
technology of the last ten years it only seems fair that one try to
fix the evaluation measure somehow.  First, not just any publication
will do.  Journals count more than books, book chapters, and
conference proceedings.  I have a grant from one European organization
that requests reports of publications, but only those in
international, refereed journals.  Book chapters and papers in COLING
or ACL proceedings simply aren't worth mentioning.  Second, not all
journals are equal.  Journals with high impact ratings as measured by
the 'Science Citation Index' (SCI, see www.isinet.com) count
more heaviliy.  The SCI rates a journal by how often other
journals cite its articles.

Both of these principles have some initial plausibility.  We all know
that publications channels differ, and the review process at journals
is arguably more reliable than the process for book chapters and
conference proceedings.  And journals certainly differ in quality.

But lots of qualifications are needed.  In computational linguistics,
competition for slots at the leading meetings results in a lower
acceptance rate (typically around 25-30%) than many leading journals
('Computational Linguistics''s acceptance rate is only slightly more
selective, 20-25%).  There's a tradition of strict selection that will
be difficult to defend if research funders systematically discount,
even disregard conference contributions.  This tradition promotes the
quality of conference presentations, and we'll lose something if it's
weakened.

The use of citation indices likewise has an initial plausibility that
is subject to abuse.  It's our task as scientists to find out new
things and to change our colleague's minds about how best to
understand language and computation.  The number of citations
certainly reflects that better than other measures, say, the number of
pages produced (the measure used in my faculty until recently).  But
there's many a slip twixt cup and lip in tracking citations.  The SCI
doesn't include citations in journals such as the 'Journal of Logic,
Language and Information, Journal of Natural Language Engineering,
Journal of Functional Programming, Computer Languages,
Computer-Assisted Language Learning', or 'Traitement Automatique des
Langues' (check the `Master Journal List' accessible from
www.isinet.com/isi/search/).  It now tracks citations in about 8,000
of the world's approx. 15,000 scholarly journals (the last figure is
due to Groningen's university librarian, Alex Klugkist).  I noted the
examples above when trying to understand how a move to using the SCI
measurements would affect the assessment of our computational
linguistics group in Groningen, but I've left more specialized and
Dutch-language journals off this list.

A further source of distortion is that articles in specialized
journals are generally rated as less important since these journals
attract fewer citations in general.  An authorial strategy of seeking
out most general venue (ultimately 'Science' or 'Nature') is rewarded even
when the most expert reviewing and selection would be found elsewhere.
The same frequency weighting will inevitably distort comparisons
between larger disciplines and smaller ones (say Chemistry
vs. Computational Linguistics), but this doesn't prevent the ratings
from being abused to compare across disciplines.

Web publication is certainly going rationalize the distribution of
scientific literature, but it won't obviate the need for systems for
selecting the better papers (refereeing), nor has anyone proposed how
it might change the socioeconomic, political question of choosing
which researchers and which sorts of research deserve funding.