Data ‘big’ and ‘small’

A recent publication from the lab is Claire Bowern’s 2015 article in Linguistics Vanguard. ‘Data “big” and “small”‘ describes recent work in the lab using the Pama-Nyungan/Australian lexical database. It also provides a brief description of the database itself; this paper could be cited when using data from the database.

The twenty-first Century has been billed the era of “big data”, and linguists are participating in this trend. We are seeing an increased reliance on statistical and quantitative arguments in most fields of linguistics, including the oldest parts of the field, such as the study of language change. The increased use of statistical methods changes the types of questions we can ask of our data, as well as how we evaluate the answers. But this all has the prerequisite of certain types of data, coded in certain ways. We cannot make powerful statistical arguments from the qualitative data that historical linguists are used to working with. In this paper I survey a few types of work based on a lexical database of Pama-Nyungan languages, the largest family in Aboriginal Australia. I highlight the flexibility with which large-scale databases can be deployed, especially when combined with traditional methods. “Big” data may require new methods, but the combination of statistical approaches and traditional methods is necessary for us to gain new insight into old problems.

Comment on this post