System | Number of languages | Free |
TextCat | 69 | free |
SILC/Alis | 28 | commercial |
Xerox MLTT Language Identifier | 47 | commercial |
SUN's language identifier | 12 | ? |
Collexion | 15 | commercial |
Stochastic Language Identifier | 13 | free |
Another demo of TextCat, with different language models, by Beat Flepp. | 11 | cf. TextCat |
Natural Language Identification Tool (Giguet) | 4 | ? |
Neural Network for Language Identification | 4 | ? |
Rosette Language Identifier by Basis Technology | 30 | commercial |
IDRIS LingWhat? | ? | ? |
Language Identification program by Ted Dunning | 2 | free |
Lextek Language Identifier | many | commercial/free |
LangWitch by Morphologic | 7 | commercial |
Language identifier by Petamem | 65 | ? |
Python script by Damir Cavar | 5 | free |
libtextcat | cf TextCat | free (BSD) |
Java implementation of TextCat | cf TextCat | free (?) |
Languid | 72 (including such languages as pig latin, klingon, and both ukrainian and ukranian). The author writes: I've been a big fan of TextCat, and wanted to see what happened if I combined the same algorithm for n-gram based identification with some intelligence about Unicode. The result is a Unicode-friendly language identifier that makes some initial guesses based on script block. It relies on proper UTF-8 input to be happy. Download | GPL |
Mguesser | about 100 (charset/language pairs); about 50 languages. C implementation of textcat | GPL |
Python implementation of textcat | ||
lid | 23 (in a range of encodings; a particular feature of this language identifier is, that it may even identify the language of texts in a transliterated form for some languages) | commercial |