Template talk:Language families

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
WikiProject Languages (Rated Template-class)
WikiProject iconThis template is within the scope of WikiProject Languages, a collaborative effort to improve the coverage of standardized, informative and easy-to-use resources about languages on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 Template  This template does not require a rating on the project's quality scale.
WikiProject Linguistics (Rated Template-class)
WikiProject iconThis template is within the scope of WikiProject Linguistics, a collaborative effort to improve the coverage of linguistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 Template  This template does not require a rating on the project's quality scale.

Size and redundancy[edit]

I think all those links should be piped to avoid repeating the word "languages" a gazillion times. That will reduce the visual size of the thing and will make it easier to find individual entries. --Latebird (talk) 01:12, 15 January 2009 (UTC)

Do we want families like Celtic, or just the top level? Do we want obsolete proposals, or just those we retain in our classification? I ask because the Brahman languages are included, but not many similar families. Brahman was part of Wurm's Madang-Adelbert Range, itself part of Trans-New Guinea; later Ross broke up the Brahman languages within Madang. kwami (talk) 00:46, 24 January 2009 (UTC)

I'd say just the top level, as the template is quite big enough already. I removed Yupik languages, because it's uncontroversial that Yupik belongs to the Eskimo-Aleut languages, which are already listed. I see that Mongolic, Tungusic, and Turkic are listed separately; I'd be in favor of removing all three in favor of one Altaic languages link. I know the Altaic hypothesis isn't universally agreed upon, but I believe it has more supporters than detractors. —Angr 11:52, 24 January 2009 (UTC)
Support Altaic. But Khoisan is not supported by anyone anymore, so it should be broken up. It looks like there are also more families in PNG and S. America. kwami (talk) 11:59, 24 January 2009 (UTC)
Actually, we might want to have broad regional templates: Sahulian families, S. American, N. American, and Eurafrasian. I doubt anyone navigating Nilo-Saharan is going to be interested in Amto-Musan. kwami (talk) 12:03, 24 January 2009 (UTC)
Sahulian? Never heard that word before. Does it mean Subsaharan African? I'd also be in favor of grouping Austronesian together with all the Papuan and Australian languages into "Australia/Oceania" or something like that. I didn't know Khoisan had lost favor. I'd recommend starting with Ethnologue's classification as a guide, and then deviating from it when there's good reason to. —Angr 12:12, 24 January 2009 (UTC)
Sahul is Oz/New Guinea. Austronesian works both there and in Asia.
Ethnologue 16 is coming out in a couple months. Let's see how they improve. kwami (talk) 12:16, 24 January 2009 (UTC)
Broke it up, deleted maybe ten, and added a bunch of Papuan families. Lots of American families are probly missing, I haven't checked. kwami (talk) 09:10, 26 January 2009 (UTC)
Okay, added the rest of the American families. I tried to avoid tentative proposals (perhaps Yuki-Wappo should be removed to a 'perhaps also' section, as in S. America), which means I also broke up traditional but poorly supported families like those in Africa. Only added 'perhaps also' families in S. America when their members would not otherwise occur, and recent scholars have thought the proposals likely to pan out. kwami (talk) 09:30, 28 January 2009 (UTC)

Auxiliary languages?[edit]

Why is the link to International auxiliary language here? That seems quite unrelated to the topic of language families. —Angr 11:58, 24 January 2009 (UTC)

Cuz they're not in normal families. Like creoles. kwami (talk) 09:11, 26 January 2009 (UTC)


Kwami, I am continuing here as a more appropriate venue our discussion begun on the talk page of Altaic languages.

(1) The question that emerged is whether language families that are composed of a single language should be added to the language families template, or not. Examples of such families are Nivkh and Basque.

The principle currently followed on the template is that the highest-order families that are generally accepted by linguists are included. There are, of course, many other language families, for example Indo-European is composed of Anatolian, Tocharian, Germanic, Greek, etc. What is listed is not the world's language families but the highest level of classification on which consensus has been reached.

A listing of all the world's language families would obviously be several times larger.

It seems to me that the labeling of the template is not strictly accurate, and that it is (in its current form, at least) really a template of the highest-order classifications yet reached by historical linguists.

Now, as I have argued, there is no taxonomic difference between a family composed of one language and a family composed of many, for example Eskimo-Aleut is composed of the Aleut language and the Eskimo language family, itself divided into the Inuit language and the Yupik language family: Aleut and Eskimo are at the same taxonomic level, as are Inuit and Yupik.

Therefore, if the principle of the template is the highest order of classification yet reached, language isolates should be included.

(2) A further interesting question is that linguists don't all agree on classifications. This is true on an intermediate level, e.g. some accept Altaic, others don't, dividing it into unrelated Turkic, Mongolic, Tungusic, Korean, and Japanese families, likewise for Cushitic, etc. It is also true on a macro-level: Alfredo Trombetti grouped all of the world's known languages into a single stemma. Some, however, would divide some of the language families that are generally accepted, e.g. every now and then a proposal surfaces to split up Indo-European.

There is also a serious question as to what constitutes a language and what constitutes a dialect. At the end of the day, no firm dividing line seems possible. E.g. the Romance languages could be grouped into a single family (with perhaps the exception of Romanian), since their local forms constitute an almost perfect dialect continuum from central Belgium down to Portugal and Sicily, the French, Italian, Spanish, and Portuguese languages being epiphenomena generated by political hegemony. Yet they are considered to form several separate languages, while Arabic, which has much the same configuration, is classed as a single language. If you think of it, Romance only began to break up a hundred and fifty years or so before Arabic (final dissolution of the Roman empire in 476 versus Hegira in 622). In other words, the categories we draw up are arbitrary and unverified.

Just as we cannot distinguish a language and a dialect, we cannot distinguish a language and a language family. E.g. we speak of one Aleut language, but some of its dialects are hardly mutually intelligible, or one Chinese language, but several of its major dialects are mutually unintelligible. On the other hand, we speak of a Scandinavian language family, but its three Continental members, Danish, Swedish, and Norwegian, are mutually intelligible.

Thus, the distinction between a language and a language family is often political rather than linguistic or a mere accident of tradition.

Furthermore, the existence of linguistic continua like that of Arabic poses severe problems for the categories of "language" and "language family".

(3) On Wikipedia, we don't need to resolve these theoretical issues. We can rely on consensus where it exists and note disagreements where it does not. It does seem to me, however, that there is no justification for excluding isolates from the category of language family, e.g. Elamite or Basque.

In any case whether they are languages or language families is often disputable, e.g. Basque is sometimes viewed as a single language, sometimes as the Vascon family. Just as Romance could be viewed as a single language, like Arabic, or multiple languages, like French and Italian. Or as Italian could be viewed as a language family (major differences between the speech of Milan and that of Palermo!).

The operative principle in the template should thus be to include only the top level of classification, not whether a given entity is a language or a language family, since these cannot be rigorously distinguished.

From another point of view, what is called a language family in the template could be called a language isolate, since it has no generally accepted relation to any other language family.

The size of the template doesn't seem important since the template will usually appear as a single bar in the "Hide" mode.

It also seems to me that the template is not a template of the world's language families but of the highest level of genetic classification on which there is consensus among linguists and should be relabeled as such, perhaps as "The world's independent language families".

It is really a portrait of the linguistic state of the art - more of these families will be shown to be related in future, some of the unities now accepted may be divided.

(4) From the POV of monogenists like Merritt Ruhlen, all the world's languages ultimately form a single family. If consensus tends in this direction (currently it doesn't), it will eventually reduce the families listed to a single family, call it Proto-World or whatever. The template could thus disappear (unless polygenesis is true, the connections between top-level families are too remote to demonstrate, or a strict Schmidtian / Boasian scrambling of languages over time is justified). The template is thus an index to the stage of classification reached, not an accurate portrait of genetic relationships - unless one assumes that all the top-level language families and isolates are genetically unrelated to any others, which is highly improbable.

In other words, I'm not quite sure what kind of wares this template is hawking. I think it bears further reflection.

VikSol 16:37, 3 February 2009 (UTC)

I think it would be best to use a single, Wikipedia-external, source to decide what gets included here and what doesn't. The most obvious choice would be families that have their own ISO 639-5 code, though that has the disadvantage that it includes many noncontroversial subfamilies of larger families (e.g. Celtic within Indo-European). It also has the disadvantage that it does not take cutting-edge research into consideration (e.g. it has Khoisan as a language family, although Kwamikagami informs us that no one still believes in it). Still, I think doing so is the only way to keep the template NPOV and OR-free (where the OR is not so much what language families exist but what language families' existence is widely accepted or not). —Angr 16:53, 3 February 2009 (UTC)
I like "The world's independent language families". I think we can go ahead and change that now. 'All families' would number in the thousands. Yes, size matters, because if people cannot navigate it, there is no point in having a template at all.
VikSol, I've made many of the same arguments about isolates. Yes, technically they should be here, but there are two problems: navigability, and deciding which to include (i.e. isolate vs. unclassified). As for which is an isolate and which a small family, I simply went on how we treated the languages here on Wikipedia, which for the most part follows Ethnologue, since after all that's where we're directing people. That is, of course, open to debate (I've changed Ainu to a small family), but many of those debates have already occurred on the articles. If we include isolates, I think perhaps it would be best to separate them off for each continent. As for which to include, take Shompen. You could argue that's a family or an isolate, but perhaps best to count it as unclassified. When you get to the Americas, the dividing line between isolates and unclassified languages becomes quite blurred, as Ishkar and I have discovered. I don't think we should include unclassified languages as "families".
Angr, I don't know of any good overall treatment. IMO we should err on the side of caution, and only include families that are widely accepted as having been demonstrated in recent literature. Altaic is borderline; New Guinea doesn't have much literature to go on. I took Dimmendaal for Africa, as thoughtful and willing to look at long-distance relationships, but also willing to say where proposals aren't yet convincing. I took Ross for New Guinea, who's about the only scholar out there right now (unless Foley's come up w something new?) [oh, and except for Left May–Kwomtari, which suffers from severe data problems which Ross is now aware of but which AFAIK have not yet been addressed]. Etc. If we take one source as authoritative, then it's likely we'll have intelligent coverage of their area of expertise, and rather random-quality coverage of everywhere else. kwami (talk) 21:41, 3 February 2009 (UTC)

(1) I admit I'm still not real comfortable with the concept of this template. "Highest-order language families on which there is currently consensus" would be the most accurate (or something like it). Although I think "independent" is a clear improvement, it has the downside of potentially implying these language families are unrelated, which implies linguistic polygenesis, a rather radical affirmation (in contrast to the late 19th c. when Haeckel was considered respectable).

Another possibility might be "the world's primary language families", used in some infoboxes. "Primary" has more or less the same meaning as "independent" here but is less of a positive affirmation, which may be an advantage. Other infoboxes have "the world's major language families", which does not really have a determinate meaning, but does have the advantage of implying that other languages may occupy more or less the same level but are not included for one reason or another.

Maybe "recognized language families" or "generally recognized language families"? Maybe "generally accepted language families"? This implies that further progress is possible (as will no doubt occur).

I think "generally accepted language families" might do the trick. It is positive, emphasizing unity and consensus, the possibility of future progress, and acknowledging the existence of disagreements. It leaves room for doubt over some (it is a good bet at least one or two of them will fall by the road) and does not commit the encyclopaedia to an artificially doctrinaire position (often found in its competitors). It takes no position for or against monogenesis or polygenesis.

Still, it implies all language families are included, a possibility discussed and ruled out above on this talk page. Also, "generally accepted" is implied by the fact it is on Wikipedia. It might therefore be sufficient to say "Highest-order language families", but, again, this implies there is no higher order, which is probably not true.

Maybe the term exists in biological classification and is just staring us in the face. "Language phyla"? Has the advantage it does not prejudge their ulterior relations. Has the disadvantage it is not a habitual expression in linguistics, though some use it.

"The world's primary language families" may on balance be the least damaging expression (at least of those on the table). It implies, a little bit, that you can go no further and, somewhat less, that maybe they are just primary pro tem, and their status might change. It will be congenial to "Diffusionists" and will not make "Geneticists" real mad (so called by Ruhlen 1994).

It remains more of a "Diffusionist" than a "Geneticist" concept. Maybe the template homepage should include two alternative versions of the template, a maximally unified one and a maximally diverse one, e.g. 3 families in the Americas for the first, several hundred for the second (by one account, over 2,000).

However, the maximally unified one would have to be constructed on different principles, since the continental unities tend to overlap in it: not only would we have Afro-Asiatic and now Dene-Yeniseian, but we would also have Eurasiatic (Eskaleut + Uralic etc), not to mention Indo-Pacific. It would probably have to be a family tree (don't know how to format this) rather than a table. This would also allow it to avoid disappearing when the last remaining independent phyla are unified, as by Trombetti.

Maybe there is just no way around using more words to say what is needed. "The highest-order generally accepted language families", "The highest-order language families that are generally accepted", "The highest-order language families that are generally accepted at the present time", or words to that effect. Maybe "The highest-level generally accepted language families".

(3) Isolates seem to be very geography-dependent. In Eurasia we would have a manageable number: Basque, Nivkh, Ainu, Korean, Kusunda, Sumerian, Elamite, maybe Hurrian/Urartian... "Language isolate" lists 12, with 1 questioned, 1 simply the last survivor of an attested family, 1 or 2 unclassified, 2 or 3 possibly part of established families, leaving about 5 to 10 by generally accepted classifications. The article sometimes treats "isolates" as living languages only, which would rule out Elamite, for example, but it includes Hattic. Go figure.

The article lists 21 isolates for Australia and Oceania, also Etruscan and Iberian for Europe. It's when we get to the Americas that we hit the paydirt: 31 for N. America (some unclassified), 38 for S. America (several unclassified); the unclassified languages are often lightly attested.

I don't see these figures as all that intimidating. If we eliminated all the unclassifieds, especially those that are weakly attested, we would still have a reasonable number. It might also be possible to eliminate isolates for which a strong case of relationship has been made, even if it is not generally accepted.

We could add a link to "Unclassified languages" in the last line to make sure all bases are covered.

Maybe we could try a "sandbox" template and see what it looks like?

It's true that with the multiplication of the isolates the table becomes hard to understand, thereby diminishing its utility. On the other hand it seems like a shame to exclude, e.g., Korean.

VikSol (talk) 01:57, 4 February 2009 (UTC)

I added several of the better accepted proposals in order to include the isolates they link together. Korean comes in under Altaic, for example.
'Primary' LF's works for me. None of the others do. 'Generally accepted'—well, that's everything in an encyclopedia, isn't it? So it's meaningless verbage. 'Demonstrated' would be better: Khoisan has not been demonstrated, and according to Dimmendaal, neither has Ubangian in NC, even though it's 'generally accepted' to belong there. We could hold isolates to a similar standard: Basque and Kalto are demonstrated isolates, Shompen is not.
As for higher-level groupings, the problem is that they're mostly bullshit. They'd be like adding UFOs in a meteorology template. (UFOs exist by definition, of course, it's just highly dubious that they have anything to do with ET.) Amerind is garbage. So is Indo-Pacific (though I like Great Andamanese–West Papuan: If Andamanese were in New Guinea, that might be an accepted family). Nostratic is a bit more promising, and I'm quite partial to Austro-Tai and Austic, but it's not really our job to make such judgements. kwami (talk) 08:42, 4 February 2009 (UTC)

Demonstrated isolates: You could argue for a few more in N.Am (though things like Beothuk are just not well enough attested to say much of anything).

Eurasia: Basque Burushaski Kalto Korean Kusunda Nivkh Sumerian (no, not Etruscan)

Africa: Hadza Sandawe (not Bangime, Laal, etc., at least for now)

N. America: Chimariko Haida Karuk Kutenai Siuslaw Takelma Timucua Washo Yana Yuchi Zuni

Mesoamerica: Cuitlatec Huave Jicaque Lenca Purhepecha Seri Xinca

S. America (living langs only): Aikana? Andoque? Camsa Canichana Cofan? Huaorani Irantxe? Itonama Joti Mapudungun Movima Taushiro Tequiraca (Auishiri) Ticuna Trumai Warao Yamana Yuracare

New Guinea: Abimomn Busa Isirawa Kol Kuot Sulka Taiap Yalë Yuri

Oz: Enindhilyagwa Gaagudju Laragiya Ngurmbur Tiwi Umbugarla

kwami (talk) 09:35, 4 February 2009 (UTC)

A reservation I still have about the title is that "demonstrated primary language families" could be understood as meaning these language families have been demonstrated to be primary, i.e. unrelated to any others, whereas it is likely that some of these families will eventually be combined into larger units. To prevent this misunderstanding, I suggest changing the word order to "primary demonstrated language families", and have tentatively carried out this edit. VikSol (talk) 10:17, 13 February 2009 (UTC)
Yes, of course. That's a clear improvement. kwami (talk) 10:34, 13 February 2009 (UTC)
Although I've changed the title in the template, the old order continues to show up on individual pages (e.g. "Altaic languages"). Does the template need to be re-attached to every article? VikSol (talk) 21:07, 13 February 2009 (UTC)
It will refresh once your cache clears. Check out a family you haven't visited before, and it should be fine. kwami (talk) 21:16, 13 February 2009 (UTC)

About translating this template[edit]

Hi people. I would be very much interested in making the Spanish version of this template. Do I need to ask for any permission for that, or may I proceed straight away? --Fadesga 23:24, 2 May 2009 (UTC) —Preceding unsigned comment added by Fadesga (talkcontribs)

Everything on Wikipedia is in the public domain. Translate away! kwami (talk) 02:20, 3 May 2009 (UTC)
Ahem! Everything on Wikipedia is not in the public domain! Nevertheless, it's true you do not need to get anyone's permission to translate this template for Spanish Wikipedia. —Angr 05:56, 3 May 2009 (UTC)

Inconsistent use of questions marks[edit]

Question marks are used in three ways in this template:

  1. Undemonstrated macro-families, such as Macro-Je. These, fortunately come after the main families.
  2. Isolates that might be part of a language family under some proposals (but only some of these, it seems to be arbitrary)
  3. Languages that might be extinct, such as Lenca.

I propose that we settle on using them only for the third reason, and settle on a rule for deciding which language families we include.

This also raises another issue. Though I understand the reason for the first, given that this is only used for poorly-documented languages, it gets confusing. I don't have a suggestion one way or the other, but I feel like we should have a rule of some sort. (We generally make the call when using Template:Infobox language family.) Otherwise, why not include Nostratic (which has broad but still clearly minority support) or the Arnhem Land languages (which are a plausible family, and have the advantage of grouping together several obscure languages, like Macro-Je)? --Quintucket (talk) 20:14, 26 December 2011 (UTC)

I think when majority of researchers working on the families feel that a connection is plausible, we include it w question marks. That's true for Macro-Je and Altaic, but not for Nostratic. If Arnhemland is widely respected, we can add it; I don't know personally if that's the case. — kwami (talk) 06:45, 27 December 2011 (UTC)
That makes sense. Overlapping consensus of demonstrated vs. plausible. I don't know about Arnhem Land except what I read on Wikipedia. Are you on the Funknet mailing list? I'm wondering if it would be acceptable to ask there what people know about things like this (i.e. who's currently working in the field.)
It still doesn't address the other problem. Though I can tell from context context that a question mark in the middle of the body indicates a language family that might be extinct, and one at the end indicates a hypothetical language family, it still seems rather clumsy. What about using an asterisk for the hypothetical language families? --Quintucket (talk) 22:01, 27 December 2011 (UTC)
Or not use a question mark for possibly extinct? That's not iconic, whereas a question mark for questionable is. — kwami (talk) 17:57, 28 December 2011 (UTC)
Really? I'd have assumed the reverse. But let's do it that way then. Marking recently extinct language families is rahter less important than marking the long-extinct ones. --Quintucket (talk) 15:20, 30 December 2011 (UTC)

Method of not including categories.[edit]

Is there a way to use this template without including Category:Language families? I think that it would be useful to have on proposed language families as well, and started to add it, but then I realized it was adding those automatically into the category, which is inappropriate because they're already in the subcategory Category:Proposed language families. One option is to copy this template, but that's an excedingly inelegant option, since either the copy would have to be updated in sync, or I'd have to change this one to only include the copy plus category or somesuch. There must be a more elegant way. --Quintucket (talk) 13:29, 28 December 2011 (UTC)

Shouldn't they be separate things? I'd just remove the cat altogether and add it manually. — kwami (talk) 17:55, 28 December 2011 (UTC)
Sounds like a good idea. I'll do that sometime soon, unless you know where I could ask a bot to handle it? --Quintucket (talk) 15:18, 30 December 2011 (UTC)

Bolding the large ones[edit]

I don't get it - how is something as large as Turkic or Japonic not bolded, but tiny things like Sepik and Torricelli are bolded? What are the criteria for bolding on this template? -- Y not? 20:39, 31 January 2013 (UTC)

Japonic and Turkic are rather small families. Japonic only has a dozen languages. — kwami (talk) 17:26, 8 August 2013 (UTC)


The word "primary", here and in the info boxes, has been confusing to some people. It can suggest "most important". I can't think of a better word, though: "Independent" is incorrect, as it's expected many of these families are related. IMO it would also be nice to restore "established" or "demonstrated", though after "primary" (or whatever) per the discussion above. — kwami (talk) 23:14, 21 December 2013 (UTC)

How about "top-level"? Uanfala (talk) 10:49, 7 July 2016 (UTC)

Needs line explaining question mark use.[edit]

To Quintucket and Kwamikagami: The end of this template reads:

Families in bold are the largest. Families in italics have no living members.

I think the question mark usage should also be explained here. You both discussed its usage before. Did you come up with a decision? Would it be something like:

Families with question marks are uncertainly grouped or are possibly extinct.

Sorry all I can do is point this out to some of the people who worked on this before, but my real-life limitations are getting in the way of doing this myself and I might not make it back here. Thanks! — Geekdiva (talk) 02:51, 1 March 2016 (UTC)

Do we use a ? for possible extinct? That would be confusing. Otherwise I'm fine with the change. — kwami (talk) 00:35, 2 March 2016 (UTC)

Sign Language Families[edit]

While sign language families tend to be smaller than oral language families and home to more isolate families, there are two or three language families that I feel should be represented in this template, namely the Francosign (French Sign Language), BANZSL and Arab Sign languages. The Francosign family itself encompasses around 60 languages with at least eight subgroups within. While both the BANZSL and Arab Sign families are smaller, both comprehend a significant speaker base. In addition, adding families of sign languages to this template will help with legitimacy surrounding sign languages, promoting the fact that sign languages are equal to oral languages on all grounds. (I choose to use Francosign here because amongst speakers and linguists/students, this is the [informal] term, and this is a talk page, so this is inherently more informal).

In terms of display, they could appear between the Eskimo-Aleut line above the pidgin/creole/mixed line with four sections (bringing the lowest line down to four sections), and the Francosign and "sign languages" sections could appear longer than the other two due to the weight in each section. — Preceding unsigned comment added by Danachos (talkcontribs) 21:40, 11 March 2016 (UTC)

To include "Romance" and "Semitic" would do more good than harm[edit]

I reslize that Romance is part of Indo-European. Semitic I have no idea what it's under. But to make a tool useful to others than to the small number of professional linguists, you have to mention, IMHO, the two best-known (by far) language families. deisenbe (talk) 23:39, 19 April 2018 (UTC)

But this template is for primary language families, not for well-know subgroups of such families. It's pretty big already, even if it errs on the side of clumping and omission (there are several smaller primary families that don't seem to be included). And if we tried adding a choice selection of subgroups, how would we decide which subgroups to include and which not to? There's nothing special about either Romance or Semitic: you could easily think of similar subgroups that are more prominent in the minds of readers (Germanic), containing a larger number of languages (Oceanic), or being spoken by larger populations (Sinitic). – Uanfala (talk) 12:32, 6 May 2019 (UTC)