Orthographic neighborhoods in neural networks
Stéphan Tulkens, Dominiek Sandra and Walter Daelemans


The orthographic neighborhood effect is a well-known effect that arises in word reading, and can be summarized succinctly by the statement that “words which look like other words are read faster”.
The neighborhood effect does not depend on the predictability of letters, or the frequency of letter combinations in a word, but is hypothesized to be the result of co-activation of internal representations.
In previous work (Tulkens et al., 2018), we analyzed featurized orthographic representations and showed that their internal coherence correlated with reaction times in lexical decision tasks.
One shortcoming of that work is that features are not generally thought of as internal representations.
In an attempt to overcome this limitation, we train neural networks to recognize words, and investigate whether the hidden state representations of these words also show a neighborhood effect, that is, whether the neighborhood coherence of these hidden states also correlates well with lexical decision data.
We train an MLP and RNN to predict word identity from one-hot encoded character representations
Preliminary results show that, while the neighborhoods are consistent with those from edit distance metrics and raw features, they show no quantitative neighborhood effect.

References:

Tulkens, S., Sandra, D., & Daelemans, W. (2018). From Strings to Other Things: Linking the Neighborhood and Transposition effects in Word Reading. In Proceedings of the 22nd Conference on Computational Natural Language Learning (pp. 75-85)