Data ‘big’ and ‘small’

A recent publication from the lab is Claire Bowern’s 2015 article in Linguistics Vanguard. ‘Data “big” and “small”‘ describes recent work in the lab using the Pama-Nyungan/Australian lexical database. It also provides a brief description of the database itself; this paper could be cited when using data from the database.

The twenty-first Century has been billed the era of “big data”, and linguists are participating in this trend. We are seeing an increased reliance on statistical and quantitative arguments in most fields of linguistics, including the oldest parts of the field, such as the study of language change. The increased use of statistical methods changes the types of questions we can ask of our data, as well as how we evaluate the answers. But this all has the prerequisite of certain types of data, coded in certain ways. We cannot make powerful statistical arguments from the qualitative data that historical linguists are used to working with. In this paper I survey a few types of work based on a lexical database of Pama-Nyungan languages, the largest family in Aboriginal Australia. I highlight the flexibility with which large-scale databases can be deployed, especially when combined with traditional methods. “Big” data may require new methods, but the combination of statistical approaches and traditional methods is necessary for us to gain new insight into old problems.

What We’ve Been Up To: The Riddle of Tasmanian Languages

In this 2012 paper, Dr. Bowern examines Tasmanian languages using Bayesian phylogenetic methods. Her findings shed light on an area of ambiguity in the study of linguistics. While it had been previously understood (or assumed) that the languages on Tasmania were members of a single language family, the results of this paper demonstrate there being twelve essentially unrelated Tasmanian languages. The use of algorithms in this project to clarify what was once opaque portends a future for the field in which quantitative methods like Bayesian phylogenetics prove essential to the study of languages.

What We’ve Been Up To: Computational Phylogenetics and the Internal Structure of Pama-Nyungan

The Yale Pama-Nyungan Lab’s Dr. Claire Bowern and the University of Auckland’s Dr. Quentin Atkinson published this paper in Language in late 2012. In it, they examine the internal structure of the Pama-Nyungan language family using certain phylogenetic methods.

The Pama-Nyungan family of Australian languages reaches across Australia to cover nearly ninety percent of the continent’s mainland. While twenty-five subgroups of the Pama-Nyungan family have been identified, there remains no consensus among linguists as to the relationships between those subgroups. Some argue that a reconstruction of the internal relationships between Pama-Nyungan languages is impossible due to centuries of diffusion that might obfuscate similarities.

Bowern and Atkinson, however, propose in this paper a detailed internal subgrouping and higher-order structure of the Pama-Nyungan family. Four major divisions within the family have been identified, and the paper addresses the concerns made by fellow linguists regarding the feasibility of such a structuring.

Claire has joined Yale’s Public Voices Fellowship program for the 2014-2015 year. The Public Voices Program aims to increase the role of academics in policy and public debate, for example by providing training in the writing of Opinion pieces for major media outlets.

Grammar Boot Camp

I will be holding a summer ‘grammar boot camp’ from June 1 to June 26, 2015. The idea is to have up to four advanced undergraduate students work intensively on existing high-quality archival field notes and recordings with the aim of producing a publishable sketch grammar. Students will receive a stipend and travel expenses to come to Yale.

This project is funded by the National Science Foundation’s Research Experiences for Undergraduates program; as such, applicants are limited to US citizens or permanent residents. Students who have graduated in Spring 2015 will be eligible to apply. The targeted cohort is undergraduates who will have just finished either their junior or senior year.

The materials to be worked on will be from an Australian Aboriginal language from Western Australia and will include both print materials and audio files. It is probable that the ‘print’ materials will already be digitized and in Toolbox.

Students will meet twice a day as a group with me to discuss analyses and writing. They will spend the rest of the time working with the materials in the Linguistics department. They will receive regular detailed feedback on the analysis and writing. Familiarity with Australian languages is not required but I would expect that successful applicants would do some reading of grammars of related languages prior to the start of the boot camp.

Applications for the boot camp are now open. The deadline for applications is January 15, 2015, and applicants will be notified of the result in mid-February.

To apply, please send the following materials electronically:

. a letter of application, describing your experience in linguistics, including research experience, your future plans, and why you’d like to join the boot camp.
. a writing sample, such as a linguistics term paper
. course transcript (this can be an unofficial transcript)

Please send materials as file attachments to, cc’ed to Applications will be acknowledged within 2 days – if you don’t get an acknowledgment, please let me know.

Please also arrange for one or two letters of recommendation/support from faculty to be sent to the same email addresses, also by January 15.

Students will need to show some evidence of prior research experience (e.g. through an RA-ship or by having a senior thesis in progress) and some familiarity with language documentation procedures (e.g. through having taken a field methods class or equivalent, such as having attended CoLang or a LSA Institute class). Applicants will need to show attention to detail and ability to focus on a project for a sustained period. Students will need to be able to travel to New Haven for the entire period of the boot camp and should expect to work solely on this project during that time.

Please forward to anyone you think would be interested and feel free to contact me with any questions.

About: Language as a Window on Prehistory

One of the human beings’ defining characteristics is the command of language. Little is known, however, about the changing of language over time or the ways that society might change alongside language. In many areas, language has preserved evidence for past contacts, migrations, and cultural change, and in many cases, language surpasses even archaeology and genetics with regard to its value in investigating the past. Languages have been shown to change in regular ways, and it is also know that languages reflect certain aspects of their speakers’ cultures. The relationship between language and society is integral to understanding why the world’s 7,000 languages look the way they do.

The Pama-Nyungan family of languages in particular is poorly understood. This assortment of languages covers nearly the entirety of the Australian mainland, but little is known about the relationships among its internal subgroups. A comprehensive understanding about the history of language change within the Pama-Nyungan family—as well other Australian language families—is hampered by a number of complicating factors. It has long been believed that the extent of language contact in Australia has muddied the “true” genetic relationships between different languages, and that the existing data are too poor to be of any use. Thanks, however, to new research, it is now known that Australian languages actually make for a good testing ground for linguistic hypotheses. Systemic comparisons require a plethora of data, and the comprehensiveness of Pama-Nyungan’s is now comparable to that of Indo-European and Austronesian.

Crucial to this research project is a database of Australian lexical material hosted at Yale. This database contains more than 750,000 lexical items and data from thousands of references. Such a vast repository of information on Australian languages has been vital to Bowern’s research over the past few years, and it has also served as a resource for Aboriginal communities in Australia working on their own languages. Prominent among the goals of this project is to make this highly useful database more widely available to linguistic researchers, as well as to use it on this project.

This project will include a number of sub-projects. First among these is the task of data entry. As mentioned earlier, there exists a grand database of Australian lexical terms from both Pama-Nyungan languages and non-Pama-Nyungan languages. This sub-project will feature the collection of new information and the eventual completion of this database. A second sub-project is to reevaluate phonological generalizations about Australian languages. Other sub-projects include the reconstruction of certain Australian languages and the dissemination of the database on Australian lexical items.

This project is intended to stretch across three years, and it will be staffed by a number of undergraduate and graduate students from Yale University, as well as exchange students from the University of Queensland. Students of linguistics participating in the venture will gain a fantastic amount of experience in research techniques and statistics. Much of the work done by the students will be done largely independently, albeit with supervision by Bowern. This team will be engaging in ground-breaking research in the field of Australian languages. The details of how these languages have changed over time—especially in the period before European colonization—are largely unknown, but this project should provide us with some answers to those questions.

Lab manager for NSF project

I’m pleased to announce that I’ve hired Matthew Massie (Yale) as lab manager to help coordinate aspects of my new Pama-Nyungan grant. This will include more regular updates to the papers blog, coordinating consultations about release of materials associated with the comparative database, and providing more plain English/informal summaries of recent research activities.

