To start with a simple example, such a tool could be useful to search in text corpora for a particular reading of a given word. For instance, the Dutch word bar is ambiguous. It can be a noun (in which case the word means the same as in English), or it can be a degree adverb, as in
In the latter case, bar is a negative polarity item. In order to
collect example sentences of such negative polarity items (for
instance in order to investigate the various contexts in which such
negative polarity items can occur), a linguist now typically uses a
tool to search for a given word. The resulting set of sentences will
then need to be checked by the linguist in order to filter out all the
unwanted sentences in which bar is used as a noun. Given that in
this particular case the wrong examples are much more frequent
than the useful examples, this is a time-consuming task. If the tool
were to possess linguistic knowledge, as we propose here, it could
withdraw the wrong examples itself.
As a much more complicated example, one could ask (in some appropriate format) for sentences in which a prepositional phrase argument has been extra-posed to the right of the verbal group. Note that in order to find appropriate examples the tool should not only be capable of recognising syntactic phrases such as root sentences and prepositional phrases, but the analysis should be deep enough to recognise the difference between prepositional phrases which function as adjuncts and as argument. The tool would then for example return a set of examples:
Another example usage of the tool could be to identify examples of
verb raising constructions in which an adjunct takes narrow scope,
i.e. it is an adjunct modifying one of the verbs embedded in the verb
cluster (cf. [113]):
It should be clear however that this tool only has limited knowledge
of syntactic constructions (otherwise creating the tool would
presuppose knowledge that the use of the tool seeks to discover). We
envisage that the tool provides an extension of regular expressions
capable of recognising matching syntactic brackets, major syntactic
categories, and grammatical functions such as subject, (in)direct
object and modifier.
The novel feature of this application (in contrast with tools such as tgrep ) will be that it can search in text corpora which need not be syntactically annotated. This has the obvious advantages that much more corpus material is available (especially now that large amounts of text corpora are available through the Internet). A further possible advantage is that it might be easier to change linguistic analyses in a grammar, rather than in an annotated corpus. Of course, the challenge is to make this application fast enough for it to be of any practical use. Moreover, we believe that even if only a small fraction of the described functionality can be achieved, then this could be a useful tool for linguists working with large text corpora.