Typoo or orthographic error? Automatic classification of typographic versus orthographic errors using keystroke logs
Rianne Conijn, Luuk van Waes and Menno van Zaanen


The automatic classification and correction of typing errors in texts has been well-studied, e.g., [1], [2]. Yet, relatively little work can be found on the classification of typographic errors (slips of the finger) versus orthographic errors. In writing research, these errors should be treated separately, as these are cognitively different actions and can have a large influence on, for example, fluency analysis and counts of revisions [3], [4]. This distinction is hard to make using the final writing product only. By analyzing typing errors during the writing process, using keystroke logging, we gain information on both the (timing of the) production and correction of typing errors [5], [6]. Several studies have used these keystroke logs to manually code typographic and orthographic errors, e.g., [7], [8]. In this project, we aim to automatically distinguish between typographic and orthographic errors. This presentation shows our first step: the characterization of typographic errors using keystroke logs from a transcription task. In a transcription task, the final text is given to the writer, hence we assume every revision is a typographic error. Data from 2,103 Dutch transcription tasks (1,717 unique participants) were collected using Inputlog [9]. Character-level confusion matrices (as in [10]) are constructed and patterns of timings are reported. In total, 5,030 corrections were made, of which 59% single substitutions, 5% single transposition, 4% single insertions, and 1% single deletions. In 27% of the revisions more than one mutation was used, and in 4% nothing changed. We invite attendees to discuss our future steps.