6. Interfacing Hdrug

For an application to work with the Hdrug system, there are a number of predicates you have to supply. Furthermore, you can extend the Hdrug system with application-specific options. Finally, you can always overwrite existing Hdrug definitions. In this chapter I discuss the various possibilities.

Parsers and Generators

In Hdrug you can define any number of parsers and generators. A parser and generator is identified by an atomic identifier. A parser is declared by the following directive:

:- flag(parser(Identifier),on/off).

Similarly, a generator is declared by:

:- flag(generator(Identifier),on/off).

This defines a parser of generator and moreover tells Hdrug whether this parser is active (on) or not (off). Only if a parser is active, it will be used in parser-comparison runs. Not only should the application define which parsers and generators exist, but usually it will also define the `current' parser and generator. This is achieved by initializing the parser and generator flag.

:- initialize_flag(parser,Identifier).
:- initialize_flag(generator,Identifier).

Summarizing, there exist a number of parsers. A subset of those parsers are active. One of the parsers is the current parser.

If a parser (generator) is defined, then there should be a module with the same name which provides the following predicates. Note that only the first one of these predicates, parse/1 or generate/1, is obligatory. The others are not.

parse/1;generate/1. This predicate is the predicate that does the actual parsing (generation). At the time of calling, the argument of the parse/1 (generate/1) predicate is a term o(Obj,Str,Sem) where Obj is a term in which the top-category is already instantiated. Furthermore, part of the term might be instantiated to some representation of the input string (in case of parsing if the predicate phonology/2 is defined) or some representation of the input logical form (in case of generation if the predicate semantics/2 is defined). But note that the string and logical form are also available (if instantiated) in the second and third argument of the o/3 term.
count/0.This optional predicate is thought of as a predicate that might produce some statistical information e.g. on the number of chart edges built. Note that library(hdrug_util) contains predicates to count the number of clauses for a given predicate.
count/1. Similarly, but this time the argument should get bound to some integer. The argument of this predicate determines the final argument of the table_entry/6 predicate in test runs.
clean/0. If the parser adds items, chart edges etc. to the database, then this predicate defines the way to remove these again.

Top categories

Usually a grammar comes with a notion of a `start symbol' or `top category'. In Hdrug there can be any number of different top categories. These top categories are Prolog terms. Each one of them is associated with an atomic identifier for reference purposes. Each top category is defined by a clause for the predicate top/2, where the first argument is the atomic identifier and the second argument is the top-category term. For example:

top(s,node(s,_)).
top(np,node(np,_)).

The flag `top_features' is used to indicate what the current choice of top-category is. Usually an application defines a default value for this flag by the directive:

:- initialize_flag(top_features,Identifier).

The identifier relates to the first argument of a top/2 definition.

Strings and Semantics

The predicate semantics/2 defines which part of an object contains the semantics (if any). For example, in an application categories are generally of the form node(Syn,Sem). Therefore, the following definition of semantics/2 is used:

semantics(node(_,Sem),Sem).

The predicate is mainly used for generation. By default, the predicate is defined as semantics(_,_).

In a similar way, the predicate phonology(Node,Phon) can be defined. This is only useful for `sign-based' grammars in which the string to be parsed is considered a part of the category. The default definition is phonology(_,_).

The predicate extern_sem/2 can be used to define a mapping between `internal' and `external' formats of the semantic representation. This predicate is used in two ways: if a semantic representation is read in, and if a semantic representation is written out. The first argument is the external representation, the second argument the internal one. The default definition is extern_sem(X,X).

Grammar compilation

Currently, the grammar menu contains four distinct options to recompile (parts of) the grammar. It is assumed that if an application is started, the grammars are already compiled. These options will thus be chosen if the grammar has to be recompiled (e.g. because part of the grammar has been changed).

The following four predicates have to be provided by the application. If these predicates do not fulfill your needs, you can always extend the grammar menu (cf. below), or even overwrite it (as in the Ale application).

compile_grammar/0 should recompile the whole grammar.
reconsult_grammar/0 should recompile the whole grammar. If files are to be loaded, then `reconsult' is used rather than `compile'. This allows easier debugging.
compile_grammar_file/1 should recompile the grammar file that is its argument.
reconsult_grammar_file/1 idem, but uses reconsult

Test-suites

A test suite consists of a number of Prolog clauses for the predicate sentence/2, where the first argument is a unique identifier of that sentence, and the second argument is a list of atoms; and clauses for the predicate lf/2. For example:

sentence(a,[john,kisses,mary]).
sentence(b,[john,will,kiss,mary]).
lf(1,fut(kiss(john,mary))).
lf(2,past(kiss(mary,john))).

The test suite might also contain a definition of the predicate user_max/2. This predicate is used to define an upper time limit, possibly based on the length of the test sentence (the first argument), for parsing that sentence in a test-suite run. By default, Hdrug behaves as if this predicate is defined as follows:

user_max(L,Max) :-
Max is 10000 + (L L 300).

Statistical information for each parse is preserved by the dynamic predicate table_entry/6. The arguments of this predicate indicate:

an atom (the unique identifier of the sentence)
an integer (the length of the sentence)
an integer (the number of parses of the sentence, i.e. the degree of ambiguity)
an atom (the name of the parser)
an integer (the amount of milliseconds it took to parse the sentence. In case of time-out the atom `time_out').
a term (often used to indicate the number of chart-edges built). It is determined by the count/1.

Extending the Graphical User Interface

It is easy to extend the Graphical User Interface for a specific application. There are two predicates that you can define. The first predicate, gram_startup_hook_begin/0 is called before loading of hdrug.tcl, whereas the predicate gram_startup_hook_end/0 is called at the end of the loading of this file.

Viewing Prolog Clauses

If you want to use Hdrug's built-in facilities to view Prolog clauses, then it is neccessary that these clauses are accessible via the predicate user_clause/2. The arguments of this predicate are the head and the body of the clause respectively. The reason that Hdrug does not use the built-in clause/3 predicate, is that this predicate is only available for dynamic clauses.

The easiest way to obtain user_clause/2 definitions is to turn on a term_expansion definition with the appropriate effect. This is done by setting the user_clause_expansion flag to on.

6.1. use_canvas(+Mode,LeftRightTop)

Mode is a term indicating the type of data-structure to be displayed. It is one of tree(TreeMode), fs, text, chart, stat. The predicate should instantiate the second argument as one of the atoms left, right or top (for a new widget).

6.2. help_hook(PredSymbol,UsageString,ExplanationString)

This predicate can be defined to provide help on a hook predicate with predicate symbol PredSymbol. The UsageString is a list of character codes which shortly shows the usage of the predicate. The help_hook predicate which is defined for the help_hook predicate itself has as its UsageString "help_hook(PredSymbol, UsageString, ExplanationString)". The ExplanationString is a list of charactercodes containing further explanation.

6.3. ParserModule:parse(o(Cat,Str,Sem))

If ParserModule is the current parser, then this predicate is called to do the actual parsing. At the time of calling, the argument of the parse/1 predicate is a term o(Obj,Str,Sem) where Cat is a term in which the top-category is already instantiated. Furthermore, part of the term may have been instantiated to some representation of the input string (if the hook predicate phonology/2 was defined to do so). The input string is also available in the second argument of the o/3 term. The third argument is not used for parsing.

6.4. GeneratorModule:generate(o(Cat,Str,Sem))

If GeneratorModule is the current generator, then this predicate is called to do the actual generation. At the time of calling, the argument of the generate/1 predicate is a term o(Obj,Str,Sem) where Cat is a term in which the top-category is already instantiated. Furthermore, part of the term may have been instantiated to some representation of the input semantics (if the hook predicate {semantics/2} was defined to do so). The input semantics is also available in the third argument of the o/3 term. The second argument is not used for generation.

6.5. Module:count

This optional predicate is thought of as a predicate that might display some statistical information e.g. on the number of chart edges built. The predicate Module:count is called in module Parser after parsing has been completed for parser Parser or it is called in module Generator after generation has been completed for generator Generator. Note that library(hdrug_util) contains predicates to count the number of clauses for a given predicate.

6.6. Module:count

6.7. Module:clean

This optional predicate is thought of as a predicate that might remove e.g. chart edges added dynamically to the database once parsing has been completed. The predicate Module:clean is called in module Parser after parsing has been completed for parser Parser or it is called in module Generator after generation has been completed for generator Generator.

6.8. start_hook(parse/generate,Module,o(A,B,C),Term)

This predicate is a hook that is called before the parser starts. Its first argument is either the atom parse or the atom generate; the second argument is the current parser or generator (hence the name of the module); the third argument is an object. The fourth argument can be anything. It wis provided to pass on arbitrary information to the result_hook and end_hook hook predicates. For example, the predicate could pass on information concerning the current memory usage of Sicstus. This information could then be used by end_hook to compute the amount of memory that the parser has consumed. The time required by the start_hook predicate is NOT considered to be part of parsing time; cf start_hook0/4 for a similar hook predicate of which timing IS considered part of parsing time

6.9. start_hook0(parse/generate,Module,o(A,B,C),Term)

This predicate is a hook that is called before the parser starts. Its first argument is either the atom parse or the atom generate; the second argument is the current parser or generator (hence the name of the module); the third argument is an object. The fourth argument can be anything. It is provided to pass on arbitrary information to the result_hook and end_hook hook predicates. For example, the predicate could pass on information concerning the current memory usage of Sicstus. This information could then be used by end_hook to compute the amount of memory that the parser has consumed. The time required by the start_hook0 predicate IS considered to be part of parsing time; cf start_hook/4 for a similar hook predicate of which timing is NOT considered part of parsing time

6.10. result_hook(parse/generate,Module,o(A,B,C),Term)

This predicate is a hook that is called for each time the parser or generator succeeds. Its first argument is either the atom parse or the atom generate; the second argument is the current parser or generator (hence the name of the module); the third argument is an object. The fourth argument can be anything. It is provided to pass on arbitrary information from the start_hook hook predicate. Warning: the time taken by result_hook will always be considered as part of the time required for parsing. Consider using the demo flag to ensure that expensive result_hooks are not fired for parsing comparison runs.

6.11. end_hook(parse/generate,Module,o(A,B,C),Term)

This predicate is a hook that is called if the parser / generator can not wfind any results anymore. Its first argument is either the atom parse or the atom generate; the second argument is the current parser or generator (hence the name of the module); the third argument is an object. The fourth argument can be anything. It is provided to pass on arbitrary information from the start_hook hook predicate. Note that at the moment of calling this predicate the object will typically NOT be instantiated. The time required by end_hook is NOT considered to be part of parsing time; see end_hook0.

6.12. end_hook0(parse/generate,Module,o(A,B,C),Term)

This predicate is a hook that is called if the parser / generator can not find any results anymore. Its first argument is either the atom parse or the atom generate; the second argument is the current parser or generator (hence the name of the module); the third argument is an object. The fourth argument can be anything. It is provided to pass on arbitrary information from the start_hook hook predicate. Note that at the moment of calling this predicate the object will typically NOT be instantiated. The time required by end_hook0 IS considered to be part of parsing time; see end_hook0.

6.13. top(Name,Cat)

Usually a grammar comes with a notion of a `start symbol' or `top category'. In Hdrug there can be any number of different top categories, of which one is the currently used top category. These top categories are Prolog terms. Each one of them is associated with an atomic identifier for reference purposes. Each top category is defined by a clause for the predicate top/2, where the first argument is the atomic identifier and the second argument is the top-category term. The latter term will be unified with the first argument of the o/3 terms passed on to parsers and generators.

top(s,node(s,_)).
top(np,node(np,_)).

The flag `top_features' is used to indicate what the current choice of top-category is. Usually an application defines a default value for this flag. The identifier relates to the first argument of a top/2 definition.

6.14. semantics(Cat,Sem)

The predicate semantics/2 defines which part of an object contains the semantics (if any). For example, if in an application categories are generally of the form node(Syn,Sem), then the following definition of semantics/2 is used:

semantics(node(_,Sem),Sem).

The predicate is mainly used for generation.

6.15. phonology(Cat,Phon)

This predicate is useful for `sign-based' grammars in which the string to be parsed is considered a part of the category. This predicate is called before parsing so that in such cases the current string Phon can be unified with some part of the object.

6.16. extern_sem(Extern,Intern)

This predicate can be defined in order to distinguish internal and external semantic representations. This predicate is used in two ways: if a semantic representation is read in, and if a semantic representation is written out. The first argument is the external representation, the second argument the internal one. The default definition is extern_sem(X,X). A typical usage of this predicate could be a situation in which an external format such as kisses(john,mary) is to be translated into a feature structure format such as [ pred=kisses, arg1=john, arg2=mary]. NB, the external format is read in as a single Prolog term.

6.17. extern_phon(Extern,Intern)

This predicate can be defined in order to distinguish internal and external phonological representations. This predicate is used in two ways: if a phonological representation is read in, and if a phonological representation is written out. The first argument is the external representation, the second argument the internal one. The default definition is extern_phon(X,X). NB, the external format is read in as a list of Prolog terms.

6.18. sentence(Key,Sentence), sentence(Key,Max,Sentence)

Applications can define a number of test sentences by defining clauses for this predicate. For ease of reference, Key is some atomic identifier (typically an integer). Sentence is typically a list of atoms. The parser comparison predicates refer to this atomic identifier. Example sentences are also listed in the listbox available through the parse menu-button. Max can be an integer indicating the maximum amount of milliseconds allowed for this sentence in parser comparison runs.

6.19. lf(Key,LF), lf(Key,Max,Lf)

Applications can define a number of test logical forms by defining clauses for this predicate. For ease of reference, Key is some atomic identifier (typically an integer). LF is a term (external format of a logical form). The generator comparison predicates refer to this atomic identifier. Example logical forms are also listed in the listbox available through the generate menu-button. Max can be an integer indicating the maximum amount of milliseconds allowed for this lf in generator comparison runs.

6.20. user_max(Length,Max)

This predicate is used to define an upper time limit, possibly based on the length of the test sentence (the first argument), for parsing that sentence in a test-suite run. By default, Hdrug behaves as if this predicate is defined as follows: user_max(L,Max) :- Max is 10000 + (L L 300). If you don't want a time out at all, then define this predicate as user_max(_,0).

6.21. gram_startup_hook_begin

This predicate is meant to be used to extend the graphical user interface. It is called right before Hdrug's own graphical user interface definitions are loaded (i.e., right before hdrug.tcl is sourced).

6.22. gram_startup_hook_end

This predicate is meant to be used to extend the graphical user interface. It is called right after Hdrug's own graphical user interface definitions are loaded (i.e., right after hdrug.tcl is sourced). A typical use is to add application specific menu-buttons, etc.

6.23. user_clause(Head,Body)

If you want to use Hdrug's built-in facilities to view Prolog clauses, then it is neccessary that these clauses are accessible via the predicate user_clause/2. The arguments of this predicate are the head and the body of the clause respectively. Note that the body of the clause should be provided as a list of goals, rather than a conjunction. The reason that Hdrug does not use the built-in clause/3 predicate, is that this predicate is only available for dynamic clauses. The easiest way to obtain user_clause/2 definitions is to turn on a term_expansion definition with the appropriate effect; cf flag(user_clause_expansion).

6.24. graphic_path(Format,Obj,Term)

One of the three hook predicates which together define tree formats. The others are graphic_label/3 and graphic_daughter/4. The Hdrug libraries contain extensive possibilities to produce output in the form of trees. Only a few declarations are needed to define what things you want to see in the tree. In effect, such declarations define a `tree format'. In Hdrug, there can be any number of tree formats. These tree formats are named by a ground identifier. A tree format consists of three parts: the path definition indicates what part of the object you want to view as a tree; the label definition indicates how you want to print the node of a tree; and the daughter definition indicates what you consider the daughters of a node. The graphic_path definition is the first part. For instance if the parser creates an object of the form node(Syn,Sem,DerivTree) where DerivTree is a derivation tree, then we can define a tree format `dt' where the graphic_path definition extracts the third argument of this term: graphic_path(dt,node(_,_,Tree),Tree).

6.25. graphic_label(Format,Node,Label)

One of the three hook predicates which together define tree formats. The others are graphic_path/3 and graphic_daughter/4. The Hdrug libraries contain extensive possibilities to produce output in the form of trees. Only a few declarations are needed to define what things you want to see in the tree. In effect, such declarations define a `tree format'. In Hdrug, there can be any number of tree formats. These tree formats are named by a ground identifier. A tree format consists of three parts: the path definition indicates what part of the object you want to view as a tree; the label definition indicates how you want to print the node of a tree; and the daughter definition indicates what you consider the daughters of a node. The graphic_label definition is the second part. For instance, if subtrees are of the form tree(Node,Ds), where Node are terms representing syntactic objects such as np(Agr,Case) and vp(Agr,Subcat,Sem) then a tree format could be defined which only displays the functor symbol: graphic_label(syn,tree(Term,_),Label) :- functor(Term,Label,_).

6.26. graphic_daughter(Format,No,Term,Daughter)

One of the three hook predicates which together define tree formats. The others are graphic_label/3 and graphic_daughter/4. The Hdrug libraries contain extensive possibilities to produce output in the form of trees. Only a few declarations are needed to define what things you want to see in the tree. In effect, such declarations define a `tree format'. In Hdrug, there can be any number of tree formats. These tree formats are named by a ground identifier. A tree format consists of three parts: the path definition indicates what part of the object you want to view as a tree; the label definition indicates how you want to print the node of a tree; and the daughter definition indicates what you consider the daughters of a node. The graphic_daughter definition is the third part. For instance if subtrees are of the form tree(Label,Daughters), where Daughters is a list of daughters, then you could simply define: graphic_daughter(syn,No,tree(_,Ds),D):- lists:nth(No,Ds,D).

6.27. show_node(Format,Node)

If trees are displayed on the canvas widget, then it is possible to define an action for clicking the left-most mouse button on the node of the tree. This action is defined by this predicate. Format is the identifier of a tree format, and Node is the full sub-tree (that was used as input to the graphic_label definition).

6.28. show_node2(Format,Node)

If trees are displayed on the canvas widget, then it is possible to define an action for clicking the middle mouse button on the node of the tree. This action is defined by this predicate. Format is the identifier of a tree format, and Node is the full sub-tree (that was used as input to the graphic_label definition).

6.29. show_node3(Format,Node)

If trees are displayed on the canvas widget, then it is possible to define an action for clicking the rightmost mouse button on the node of the tree. This action is defined by this predicate. Format is the identifier of a tree format, and Node is the full sub-tree (that was used as input to the graphic_label definition).

6.30. tk_tree_user_node(Label,Frame)

If a tree-format is defined which matches user(_), then if a tree is to be displayed on the Canvas widget this predicate is responsible for creating the actual nodes of the tree. Label is the current label, and Frame is the identifier of a Tcl/Tk frame which should be further used for this label. The frame is already packed.

6.31. clig_tree_user_node(Label)

If a tree-format is defined which matches user(_), then if a tree is to be displayed using Clig output, then this predicate is responsible for creating the actual nodes of the tree. Label is the current label.

6.32. dot_tree_user_node(Label)

If a tree-format is defined which matches user(_), then if a tree is to be displayed using DOT output, then this predicate is responsible for creating the actual label of the nodes of the tree. Label is the current label.

6.33. latex_tree_user_node(Label)

If a tree-format is defined which matches user(_), then if a tree is to be displayed using LaTeX output, then this predicate is responsible for creating the actual nodes of the tree. Label is the current label.

6.34. shorten_label(Label0,Label)

This predicate can be defined for feature-structure display of tree nodes; its intended use is to reduce the information of a given node.

6.35. call_build_lab(F,Fs,L)

for library(hdrug_call_tree)

6.36. call_build_lab(Functor/Arity)

for library(hdrug_call_tree)

6.37. exceptional_sentence_length(Phon,Length)

For (internal) phonological representations this predicate can be defined to return the length of the representation. If the predicate is not defined, then the representation is assumed to be a list, and the length is assumed to be the number of elements of the list. The length of phonological representations is used by the display of the results of parser comparison runs.

6.38. exceptional_lf_length(Sem,Length)

For (internal) semantic representations this predicate can be defined to return the length of the representation. If the predicate is not defined, then the representation is assumed to be a term, and the length is assumed to be the number of characters required to print the term. The length of semantic representations is used by the display of the results of generator comparison runs.

6.39. hdrug_initialization

If hdrug is started, then three things happen. First, hdrug treats its command line options. After that, the predicate hdrug_initialization is called. Finally, the graphical user interface is started (if flag(tcltk) is on). This predicate can thus be used to define application-specific initialization.

6.40. hdrug_command(Name,Goal,Args)

This predicate can be used to define further commands for the command interpreter. Name is the first word of the command, Goal is the resulting Prolog goal, and Args is a possibly empty list of arguments to the command.

6.41. hdrug_command_help(Name,UsageString,ExplanationString)

This predicate can be used to provide help information on commands for the command interpreter. Name is the first word of the command, The second argument displays usage information in a short form (list of character codes); the third argument is a list of character codes containing an explanation of the command.

6.42. help_flag(Flag,Help)

This predicate can be used to provide help information on global variable Flag. Help is a list of character codes containing the help info.

6.43. option(Option,ArgvIn,ArgvOut)

This predicate can be used to define application-specific command-line options to the hdrug command. Option is the option minus the minus sign; moreover Option relates to the first argument of a corresponding usage_option/3 definition. The second and third argument is a difference list of the list of options in case the option takes further arguments.

6.44. usage_option(Option,UsageString,ExplanationString)

This predicate is defined to provide help information on the Option startup option (cf. option/3). The UsageString is a list of character codes presenting short usage information; ExplanationString is a list of character codes containing the explanation of the option.

6.45. tk_tree_show_node_help(TreeFormat,Atom)

If a tree according to TreeFormat is displayed on the canvas, then this predicate can be defined in order that below the widget a short message appears indicating what actions are bound to clicking on the tree nodes. Atom is the message.

6.46. show_relation(F/A)

you can define the relation show_relation/1 to define an action for pressing the first mouse-button on a relation name, when viewing predicate definitions in the Tk Canvas. The argument is a Functor/Arity pair. For example,

show_relation(F/A) :-
show_predicate(F/A,fs,tk).

will show the predicate definition.

6.47. display_extern_sem(+ExtSem)

Predicate to print a given external format of semantics.

6.48. display_extern_phon(+ExtPhon)

Predicate to print a given external format of phonology.

6.49. compile_test_suite(+File)

Predicate to compile the test suite in file File.

6.50. reconsult_test_suite(+File)

Predicate to reconsult the test suite in file File.

6.51. show_object_default2(+Int)

Predicate which is called if the user presses mouse button <2> on the object button number Int. A typical definition could be, for instance:

show_object_default2(No):-
show_object_no(No,tree(syn),clig).

6.52. show_object_default3(+Int)

Predicate which is called if the user presses mouse button <2> on the object button number Int.