This is a demonstration of a language guesser, as proposed in
Cavnar, Trenkle, N-Gram-Based Text Categorization.
It's implemented in Perl. You can get the programme under GPL
certain copyright conditions here. For free! No commercial
version available! The competitors!
LIST OF LANGUAGES currently supported.
But some languages are only supported in certain encodings...
- afrikaans
- albanian
- amharic-utf
- arabic-iso8859_6
- arabic-windows1256
- armenian
- basque
- belarus-windows1251
- bosnian
- breton
- bulgarian-iso8859_5
- catalan
- chinese-big5
- chinese-gb2312
- croatian-ascii
- czech-iso8859_2
- danish
- dutch
- english
- esperanto
- estonian
- finnish
- french
- frisian
- georgian
- german
- greek-iso8859-7
- hawaian
- hebrew-iso8859_8
- hindi
- hungarian
- icelandic
- indonesian
- irish
- italian
- japanese-euc_jp
- japanese-shift_jis
- korean
- latin
- latvian
- lithuanian
- malay
- marathi
- middle_frisian
- mingo
- nepali
- norwegian
- persian
- polish
- portuguese
- quechua
- romanian
- russian-iso8859_5
- russian-koi8_r
- russian-windows1251
- sanskrit
- scots
- scots_gaelic
- serbian-ascii
- slovak-ascii
- slovak-windows1250
- slovenian-ascii
- slovenian-iso8859_2
- spanish
- swahili
- swedish
- tagalog
- tamil
- thai
- turkish
- ukrainian-koi8_u
- vietnamese
- welsh
- yiddish-utf
Gertjan van Noord
Last modified: Tue Jan 5 15:07:42 MET 1999