Nature paper on the genetics of Aboriginal Australia

Somewhat belatedly, here is a link to new work of mine and colleagues’ on gene-language coevolution in Pama-Nyungan, the peopling of Sahul, and migration and admixture in the Pleistocene. It was recently published in NatureThere’s a lot in this paper, a Genomic History indeed. There has been some media attention, particularly Michael Erard’s piece on Pama-Nyungan phylogenetics and how important computational work has been to recent advances in Australian language history. There’s also a summary piece in The Conversation, particularly about the genetic side of the paper.

Latest Paper: Quantifying uncertainty in the phylogenetics of Australian numeral systems

Earlier this month, the Yale Pama-Nyungan Lab’s Dr. Claire Bowern and Kevin Zhou published a paper titled “Quantifying uncertainty in the phylogenetics of Australian numeral systems” in the journal Proceedings of the Royal Society B. You can read the paper here.

Using Bayesian phylogenetic methods, Dr. Bowern and Zhou study and analyze the numeral systems of Pama-Nyungan languages in order to reconstruct how those systems may have looked thousands of years ago. What they discover is that the finite numeral systems of Pama-Nyungan languages change over time, losing and gaining numbers as they go. According to the authors, this demonstrates a potential for adaptability and flexibility in languages commonly stereotyped as simple, limited, and incapable of expressing new concepts. They also find that there is tremendous variation over time between the behavior of numeral systems limited at the number five and those with higher limits.

Here is the paper’s abstract:

Researchers have long been interested in the evolution of culture and the ways in which change in cultural systems can be reconstructed and tracked. Within the realm of language, these questions are increasingly investigated with Bayesian phylogenetic methods. However, such work in cultural phylogenetics could be improved by more explicit quantification of reconstruction and transition probabilities. We apply such methods to numerals in the languages of Australia. As a large phylogeny with almost universal ‘low-limit’ systems, Australian languages are ideal for investigating numeral change over time. We reconstruct the most likely extent of the system at the root and use that information to explore the ways numerals evolve. We show that these systems do not increment serially, but most commonly vary their upper limits between 3 and 5. While there is evidence for rapid system elaboration beyond the lower limits, languages lose numerals as well as gain them. We investigate the ways larger numerals build on smaller bases, and show that there is a general tendency to both gain and replace 4 by combining 2 + 2 (rather than inventing a new unanalysable word ‘four’). We develop a series of methods for quantifying and visualizing the results.

Languages coded for phylogenetics

I am starting a series of posts on map data from the Pama-Nyungan project. To begin, here is a map showing the languages for which I have coded wordlists suitable for phylogenetic analysis. Note that for some reason, viewing the page in Chrome on OSX results in a jquery error. It can be viewed in Firefox, or on Chrome on a PC.

The points are coded by how much data are available. The least well attested languages have white points; the middling ones are marked by plain red, while the languages with the most complete datasets have red markers with a square inside.

Some of the points appear to be at sea. This is an irritating result of how google earth fails to account for zoom correctly; the points are close to the close but not actually under water.


Data ‘big’ and ‘small’

A recent publication from the lab is Claire Bowern’s 2015 article in Linguistics Vanguard. ‘Data “big” and “small”‘ describes recent work in the lab using the Pama-Nyungan/Australian lexical database. It also provides a brief description of the database itself; this paper could be cited when using data from the database.

The twenty-first Century has been billed the era of “big data”, and linguists are participating in this trend. We are seeing an increased reliance on statistical and quantitative arguments in most fields of linguistics, including the oldest parts of the field, such as the study of language change. The increased use of statistical methods changes the types of questions we can ask of our data, as well as how we evaluate the answers. But this all has the prerequisite of certain types of data, coded in certain ways. We cannot make powerful statistical arguments from the qualitative data that historical linguists are used to working with. In this paper I survey a few types of work based on a lexical database of Pama-Nyungan languages, the largest family in Aboriginal Australia. I highlight the flexibility with which large-scale databases can be deployed, especially when combined with traditional methods. “Big” data may require new methods, but the combination of statistical approaches and traditional methods is necessary for us to gain new insight into old problems.