Detection of Incoherent Speech in Dutch Transcripts for Classifying Schizophrenia Spectrum disorders
Talita Anthonio and Alban Voppel


Verbal communication disturbances such as incoherence of speech are a key diagnostic feature of schizophrenia spectrum disorders and can be assessed by studying the form and meaning of linguistic expressions. In particular, speech is a representative, accessible marker of disorder symptoms, because of its use in daily life as well as in interaction with healthcare personnel. As of yet, correctly diagnosing schizophrenia is challenging because of heterogeneity, subtlety and subjectiviness. The advent of natural language processing made it possible to derive abstract speech measures in a quantitative and non-biased way. In previous work, these measures were able to reliably predict diagnosis in English transcripts of schizophrenia patients.
Here, we present the first attempt to automatically classify a Dutch speech transcript as belonging to the 25 remitted schizophrenia spectrum patients or the matched group of 25 healthy controls by using extracted coherence features. A collection of Dutch audio files from a semi-structured, neutral-topic interview were transcribed and transformed to semantic vectors through a 300-dimension word2vec semantic space model trained on the Corpus Gesproken Nederlands. This model was employed to extract the minimum, maximum and average coherence of each transcript. We trained a decision tree classifier on these features plus the amount of speech. This system reached an average accuracy of 0.9 on our data with Leave-one-out cross-validation. Our results show that length and coherence are highly predictive for classifying schizophrenia from Dutch speech.