Hdrug is an environment to develop grammars, parsers and generators for natural languages. The system provides a number of visualisation tools, including visualisation of feature structures, syntax trees, type hierarchies, lexical hierarchies, feature structure trees, definite clause definitions, grammar rules, lexical entries, chart datastructures and graphs of statistical information e.g. concerning cputime requirements of different parsers. Visualisation can be requested for various output formats, including ASCII text format, TK Canvas widget, LaTeX output, DOT output, and CLiG output.
Extendibility and flexibility have been major concerns in the design of Hdrug. The Hdrug system provides a small core system with a large library of auxiliary relations which can be included upon demand. Hdrug extends a given NLP system with a command interpreter, a graphical user interface and a number of visualisation tools. Applications using Hdrug typically add new features on top of the functionality provided by Hdrug. The system is easily extendible because of the use of the Tcl/Tk scripting language, and the availability of a large set of libraries. Flexibility is obtained by a large number of global flags which can be altered easily to change aspects of the system. Furthermore, a number of hook predicates can be defined to adapt the system to the needs of a particular application.
The flexibility is illustrated by the fact that Hdrug has been used both for the development of grammars and parsers for practical systems but also as a tool to experiment with new theoretical notions and alternative processing strategies. Furthermore, Hdrug has been used extensively both for batch processing of large text corpora, and also for demonstrating particular applications for audiences of non-experts.
Hdrug is implemented in SICStus Prolog version 3, exploiting the built-in Tcl/Tk library. The Hdrug sources are available free of charge under the Gnu Public Licence copyright restrictions.
Hdrug provides three ways of interacting with the underlying NLP system:
Using an extendible command interpreter.
Using Prolog queries.
Using an extendible graphical user interface (based on Tcl/Tk).
The first two approaches are mutually exclusive: if the command interpreter is listening, then you cannot give ordinary Prolog commands and vice versa. In contrast, the graphical user interface (with mouse-driven menu's and buttons) can always be used. This feature is very important and sets Hdrug apart from competing systems. It implies that we can use at the same time the full power of the Prolog prompt (including tracing) and the graphical user interface. Using the command interpreter (with a history and alias mechanism) can be useful for experienced users, as it might be somewhat faster than using the mouse (but note that many menu options can be selected using accelerators). Furthermore, it is useful for situations in which the graphical user interface is not available (e.g. in the absence of an X workstation). The availability of a command-line interface in combination with mouse-driven menu's and buttons illustrates the flexible nature of the interface.
An important and interesting property of both the command interpreter and the graphical user interface is extendibility. It is very easy to add further commands (and associated actions) to the command interpreter (using straightforward DCG syntax). The graphical user interface can be extended by writing Tcl/Tk scripts, possibly in combination with some Prolog code. A number of examples will be given in the remainder of this paper.
Finally note that it is also possible to run Hdrug without the graphical user interface present (simply give the notk option at startup). This is sometimes useful if no X workstation is available (e.g. if you connect to the system over a slow serial line), but also for batch processing. At any point you can start or stop the graphical user interface by issuing a simple command.
Hdrug supports the visualisation of a large collection of data-structures into a number of different formats.
These formats include (at the moment not all datastructures are supported for all formats. For example, plots of two dimensional data is only available for Tk):
The data-structures for which visualisation is provided are:
Trees. Various tree definitions can exist in parallel. For example, the system supports the printing of syntax trees, derivation trees, type hierarchy trees, lexical hierarchies etc. Actions can be defined which are executed upon clicking on a node of a tree. New tree definitions can be added to the system by simple declarations.
Feature structures. Clicking on attributes of a feature-structure implode or explode the value of that attribute. Such feature structures can be the feature structures associated with grammar rules, lexical entries, macro definitions and parse results.
Trees with feature structure nodes. Again, new tree definitions can be declared. An example is dt.png
Graph (plots of two variable data), e.g. to display the (average) cputime or memory requirements of different parsers.
Definite clauses with feature structure arguments. This can be used e.g. to visualise macro definitions, lexical entries, and grammar rules (possibly with associated constraints).
Hdrug provides an interface for the definition of parsers and generators. Hdrug manages the results of a parse or generation request. You can inspect these results later. Multiple parsers and generators can co-exist. You can compare some of these parsers with respect to speed and memory usage on a single example sentence, or on sets of pre-defined example sentences. Furthermore, actions can be defined which are executed right before parsing (generation) starts, or right after the construction of each parse result (generation result), or right after parsing is completed.
Most of the visualisation tools are available through libraries as well. In addition, the Hdrug library contains mechanisms to translate Prolog terms into feature structures and vice versa (on the basis of a number of declarations). Furthermore, a library is provided for the creation of `Mellish' Prolog terms on the basis of boolean expressions over finite domains. The reverse translation is provided too. Such terms can be used as values of feature structures to implement a limited form of disjunction and negation by unification.
A number of smaller utilities is provided in the library as well, such as the management of global variables, and an extendible on-line help system.