In  a full coverage grammar is applied to a given corpus of utterances. After removing incorrect analyses this analysed corpus is then inspected to see which grammar rules are actually used. A specialised grammar is constructed which contains the subset of the rules used in the analysis of the corpus.
An obvious extension of this idea consists in adding probabilities to rules, depending on how often a particular rule is used in the analysis of the corpus. This is different from the inside-outside algorithm for stochastic context-free grammars  which is used to estimate probabilities from an unannotated corpus; but which has not been very successful in practice.2.3In  a more promising technique is described which adapts the inside-outside algorithm to partially bracketed corpora.
In  a technique is described in which the actions in an LR parsing table are augmented with probabilities. As a further step,  introduce a pruning technique which filters out very unlikely actions during the parsing process. Again, a corpus of analysed examples is used to determine what actions count as unlikely.
A promising alternative means of specialisation consists of the application of explanation-based generalization techniques to natural language parsing [90,92,99,105]. A full coverage grammar is used to analyse a given set of examples. From the analysed corpus a specialised grammar is constructed which typically only analyses a subset of the language analysed by the general grammar. However, it does so very efficiently. Moreover, and more importantly, it often favours more likely analyses. If all goes well, the specialisation removes useless analyses, while retaining the appropriate ones.
Another approach towards disambiguation is the data-oriented approach described in  and . Preliminary experiments on the ATIS corpus of the Penn Treebank  were very promising. In this model, a stochastic tree substitution grammar is created by taking into account (almost) all possible sub-trees of the trees present in an annotated corpus.
A very successful approach consists of the application of decision tree algorithms to parsing. For instance, very good parsing performance on the Penn Treebank Wall Street Journal corpus [71,72] has been reported for these techniques [48,69]. These methods make heavy use of lexical information. In  a different system based on statistical decision tree modelling is described which is also capable of capturing linguistic dependencies. Remarkable results are presented on the ATR/Lancaster Treebank of General English . The interesting aspect of this work is the central role played by a detailed and linguistically motivated grammar of English.
Very good results on the Penn Treebank Wall Street Journal corpus have also been reported in ; . A number of lexicalised probabilistic models are compared. These models are sensitive to the lexical head of constituents. Moreover, probabilities over subcategorisation frames are incorporated; complement/adjunct distinctions are important, and WH-movement constructions are treated separately. Somewhat similar techniques are described in  and , expressed in terms of Dependency Grammar, where it is very natural to express lexical dependencies of a statistical nature.
Disambiguation of prepositional phrase attachment is the subject of a number of other experiments in which phrases with prepositional phrase attachments were extracted from the Penn Treebank Wall Street Journal corpus consisting of the sequence verb noun-phrase prepositional-phrase. Of these, only the verb, head noun of the first noun phrase, preposition and head noun of the noun phrase contained in the prepositional phrase were recorded. Thus, in these experiments the lexical heads are deemed important. Experiments with a variety of techniques have been reported, including a Decision Tree model and a maximum entropy model ; a transformation-based learning model ; a relatively simple back-off model ; and a number of models using memory-based learning techniques . The adequacy of these models is roughly between 80 and 84%.  moreover performed an experiment suggesting that humans, if given the same pieces of evidence, get about 88% correct; for full sentences accuracy of 93% is obtained. These facts suggests, again, that the (head) words are of extreme importance.
Finally,  report results indicating that a language model which takes into account lexical dependencies between head words improves upon N-gram language models, for the purpose of determining the most probable continuation of an utterance.