User Tools

Site Tools


manni

Classification of Bantu languages from Tanzania and Gabon by using the Levenshtein algorithm

Franz Manni (franz.manni@mnhn.fr); National museum of natural history, Paris; Informatiekunde, Rijkuniversiteit Groningen.


I will address the computational-linguistics classification of Bantu languages from the west (Gabon) and the east (Tanzania) of Africa. Obtained classifications will be compared to a consensus tree drawn from the Bayesian posterior distributions computed on cognates sharing method (Groellemund et al. 2015). Both are largely comparable, suggesting that the Levenshtein algorithm is able to depict the historical evolution of Bantu varieties to a level that is unexpected.

Moreover, the Levenshtein classification of Gabon varieties will be compared to the genetic differences of the populations speaking the languages, and to their musical diversity.

From both experiments, I will try to address the fundamental question of the word lists that are here processed and the robustness of derived classifications:

  • The Tanzanian dataset consist in almost 1000 words that have been processed as different subsets (Swadesh words only vs. random list) showing no major difference.
  • The Gabon dataset includes varieties that are described by two independent word-lists (recorded in different times by different scholars). However, the Levenshtein algorithm clusters them together, suggesting that methodological differences in collecting linguistic material play little role in aggregate analyses.



Useful references:

http://www.ddl.ish-lyon.cnrs.fr/fulltext/Van%20Der%20Veen/Alewijnse_2007.pdf

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0151570

http://www.pnas.org/content/112/43/13296

manni.txt · Last modified: 2019/02/06 16:03 (external edit)