Overcoming ontology sparsity for entity linking of lexically novel disorder mentions
Pieter Fivez, Simon Suster and Walter Daelemans


Linking disorder mentions to ontologies is a key task in biomedical text processing, but current state-of-the-art methods do not go beyond approximate string matching techniques or trained linking functions which are heavily reliant on the lexical characteristics of reference terms present in the ontologies. While these methods perform well on the current existing benchmark test sets, those benchmark sets are mainly comprised of easy cases for which baselines already perform well. However, since the performance of current systems is so highly dependent on the lexical characteristics of reference terms present in ontologies, they do not show robust performance when they are faced with a new target domain which contains disorder mentions with substantial lexical novelty. We simulate such a situation by first mining a large target domain of PubMed articles for lexically idiosyncratic disorder mentions which are present in the ontologies, and then removing these reference terms from the ontologies, effectively generating thousands of test instances. Using this data set, we test the robustness of current systems against our proposed methods which employ distributional semantics, while also controlling for the performance on the current existing benchmark test sets. This work is meant as a step towards more generalisable and robust systems for key tasks in biomedical text mining.