Language Identification Tools

System Number of languages Free
TextCat 69 free
SILC/Alis 28 commercial
Xerox MLTT Language Identifier 47 commercial
SUN's language identifier 12 ?
Collexion 15 commercial
Stochastic Language Identifier 13 free
Another demo of TextCat, with different language models, by Beat Flepp. 11 cf. TextCat
Natural Language Identification Tool (Giguet) 4 ?
Neural Network for Language Identification 4 ?
Rosette Language Identifier by Basis Technology 30 commercial
IDRIS LingWhat? ? ?
Language Identification program by Ted Dunning 2 free
Lextek Language Identifier many commercial/free
LangWitch by Morphologic 7 commercial
Language identifier by Petamem 65 ?
Python script by Damir Cavar 5 free
libtextcat cf TextCat free (BSD)
Java implementation of TextCat cf TextCat free (?)
Languid 72 (including such languages as pig latin, klingon, and both ukrainian and ukranian). The author writes: I've been a big fan of TextCat, and wanted to see what happened if I combined the same algorithm for n-gram based identification with some intelligence about Unicode. The result is a Unicode-friendly language identifier that makes some initial guesses based on script block. It relies on proper UTF-8 input to be happy. Download GPL
Mguesser about 100 (charset/language pairs); about 50 languages. C implementation of textcat GPL
Python implementation of textcat
lid 23 (in a range of encodings; a particular feature of this language identifier is, that it may even identify the language of texts in a transliterated form for some languages) commercial

Gertjan van Noord