Standard Average Australian

My slides for my recent Association for Linguistic Typology Talk on “Standard Average Australian” are now available on Zenodo. The slides are self-explanatory, I think, and the Zenodo page has the long abstract that I submitted to the ALT for conference review. In brief, the talk is about the (largely unreferenced) claims that many Australianists (including me, I should add) have made about the languages of the country.

I am currently writing up the results for submission to Linguistic Typology. Thanks very much to the ALT conference participants, particularly the Australianists I talked to about this.

Filed under: Bardi, Historical, language documentation, Pama-Nyungan
Source: Anggarrgoon

HTK error list used to have a really useful list of common htk errors, what they mean, and how to solve them. It was maintained by then graduate student Ilana Heintz. The site disappeared about a year ago but the internet archive had a copy (I haven’t been able to find a live copy elsewhere). I’m posting it again here in case it’s useful to others, and on-going thanks to Dr. Heintz for her work in collating all this and making others’ lives a lot easier.

Continue reading HTK error list

Class on journal article writing

Last spring, I taught a graduate class on how to submit an article to a journal. Our department, like many, has a qualifying paper requirement, where students write two “publishable” or “near-publishable” research papers as a stepping stone to the dissertation. Faculty have always had the expectation that students would submit these papers to a journal, but my impression (as Director of Graduate Studies) was that this wasn’t happening as quickly or as frequently as it should. Hence this class.

Students were third and fourth year graduate students. They had all already passed our qualifying paper requirement, and had at least one manuscript to work with. We met once a week for an hour as a group, and the students met with a partner outside of class for at least an hour too. During our group meeting the students reported briefly on they’d done with their writing buddies. I also did all the activities.

This is a writing-intensive class for graduate students in linguistics who are interested in gaining more experience with writing and publication. Student may enroll with the permission of instructor and need to have a QP or other piece of writing that would be suitable for submission to a journal by the end of the semester.

The class counts towards the departmental seminar requirement for graduates in third and fourth year.

In order to pass the class, students will need to do the following:

. Submit at least one paper to a journal.
. Submit an abstract to at least one conference.
. Provide a referee report for at least one paper for a colleague.
. Have a ‘writing buddy’ within the class, to whom you provide regular feedback.
. Provide weekly feedback to the group regarding progress.

We will meet weekly as a group for an hour, and you will also meet your writing buddy for an hour.
Assessment: this was a pass/fail class.

Here was the weekly schedule. I did not make detailed handouts for class, since this was an additional class for me. We did not use a textbook. If doing this again, I could see some advantage to using something like “writing a journal article in 12 weeks” but I don’t think it’s crucial.

Week 1: General writing and research skills. Backing up, some techniques for writing consistently, and the like. Expectations of working with a writing buddy (regular time to meet with them). The students made a research project list for homework and posted it for everyone (I showed them mine, which led into a discussion of how many projects someone should be working on at any one time). We also talked about how to identify self-sabotaging tendencies in academic work.

Week 2: Identify the manuscript to submit and what needs to be done to it in order to make it publishable/submittable (e.g. ar the data sufficient, writing clarity, organization, length, engagement with the literature). We talked about word limits, general properties of journal articles, minimal publishable units, and the like.

Week 3: How to pick a journal. We talked about main journals in the field, how to figure out what’s an appropriate place to send a manuscript (what goes to Language, for example). Homework was to figure out what journal (+ backup journal) they wanted to target. We brainstormed journals and the decision process for where to send a paper.

Week 4: How to submit an article to a journal. We walked through the Diachronica online submission process, registering for the site, creating a submission, explaining all the steps, and talking about how different platforms are different. We also talked about how to interact with journal editors, what a presubmission inquiry looks like, and when it’s ok to ask for an update.Homework for this (and previous weeks) was to continue working on what needed to be done to the paper to submit it.

Week 5: Check-in. We went through what each person was doing on their paper, where they were at, what still needed to be done.

Week 6: What a referee report looks like. How long they take to do and receive, what sort of things get commented on, tone, etc. We wrote a report on a published paper (anonymized) and I shared reports I had received on a couple of papers.

Week 7. Revising and resubmitting. How to respond to referee reports. What to expect from an editor’s decision, whether you need to respond to everything, how to deal with conflicting recommendations, what to submit in a revision. Desk rejections and what they mean. I shared copies of an original submission, referee reports, resubmission, and subsequent acceptance of a paper.

Weeks 8-11: Refereeing our papers. We did three rounds of refereeing. Each week, everyone brought two copies of their paper to class, and we spent half an hour commenting on two papers. Homework was to revise the paper in accordance with the suggestions from the class “referees”. We also talked about the comments they were giving.

Week 12: Turning a journal article into a conference paper abstract. Differences between articles and conference talks.

Week 13: dealing with proofs. Proof marks, what sorts of things can be corrected at proof stage, etc.

I also had a paper I wanted to submit that spring, and since there were 5 students in the class, I teamed up with one of them as a writing buddy too.

The deadline for submission of papers was May 10, and most of the papers were submitted fairly close to that date. Of the 5 students (+ me), the results so far are: 1 accept with minor revision (a few days ago), 1 revise and resubmit (last week), 1 reject with helpful reviews for revision and submission elsewhere (in June), 1 technical rejection (+ submission elsewhere; about a week after submission), and 2 still under review.

I think it worked pretty well, and I will probably offer it again in a year or two (not this coming year).

Filed under: Other, teaching
Source: Anggarrgoon

Polygons and centroids now on Zenodo

I’ve updated the polygon and centroid files for Australian language locations, and placed them on Zenodo. This means there’s something stable for you to reference if you want to use them and refer to them. As always, comments and corrections very welcome. And as always, please consider using the Zenodo community for Australian languages to upload your own materials.

Filed under: Chirila
Source: Anggarrgoon

Videos for Zenodo uploads

I made some videos about how to upload files to the Zenodo repository for Australian languages:

is how to sign up for a zenodo account

will show you how to upload files to the Australian Languages Zenodo community. Should be a help for anyone who would like to upload files but isn’t sure how.

Filed under: Chirila, Media, Technology and Software
Source: Anggarrgoon

Color in Pama-Nyungan: update

Last November, Hannah Haynie and I published a paper in the Proceedings of the National Academy of Sciences on color term systems in Pama-Nyungan. In it, we used phylogenetic methods to show that color term systems can both gain and lose terms, and that while they do so mostly in accordance with prior work on color term systems (Berlin and Kay, Kay and Maffi, and colleagues), we also found evidence for ‘exceptional’ systems that appeared not to conform to the B&K system. We used data from the Chirila database and fairly standard phylogenetic methods of ancestral state reconstruction.

For an analysis of this type to be correct, several assumptions must be satisfied:

  • sample data need to be representative of the languages as a whole;
  • sample data need to be correct;
  • the analytical tools need to be applicable to what’s being studied;
  • the analyses need to be interpreted correctly.

Over the last six months, Hannah and I have been in correspondence with David Nash about many of these points, particularly those involving sampling, the correctness of the underlying data, and judgments about what is a color term. In particular, in the original version of Table S1, a data conversion error resulted in words from several languages being associated with the wrong row in the table (particularly Wargamay and Warlmanpa). This did not affect the analyses reported in the paper, as the error was introduced when spreadsheets were converted to Microsoft Word documents for uploading to the journal’s online submission site. [The corrected table is available here.]

The discussions with Nash revolved around several issues already identified both in our paper and the supplementary materials:

  • the difficulty of determining whether a color term is genuinely absent from the language, or simply not recorded;
  • the difficulty of establishing the ranges of color terms glossed in English by non-native speakers of the language;
  • the issue of polysemy, for example, whether a term glossed as “unripe, green” is truly a color term, or whether “green” here is meant solely in the sense of “unripe, not ready for eating” (and therefore not glossing a true color term).

Coding decisions of this type are based on a careful philological analysis of each individual source, and while phylogenetic analyses are usually robust to individual errors, systematic errors may bias the results. In general, where Hannah and I were unsure, we tended to include rather than exclude; this applies especially to terms for ‘green’ and terms for ‘red’ based on words meaning ‘blood’ (which could be interpreted as the descriptive adjective ‘bloody’ rather than a true color term). For ‘green’ terms, many languages have a word that is glossed as ‘green’ or ‘unripe’; while some of these terms do appear to be real color terms (in that they can refer to items that aren’t unripe, like shirts), others aren’t — they refer to the ripeness of fruit, not directly to its color. (We had a similar problem with ‘grey’, which was often ambiguously glossed as a color term or a word referring only to grey hair.)

Another issue is the extent to which we make use of data from closely related languages in determining the color inventory of a particular language variety. For example, if a particular variety appears to lack a term for ‘blue’, but a term is present in other languages in the subgroup, are we justified in treating the lack of a term as a true omission? In our analyses, we treated such cases as absent rather than indeterminate, because we did not want to omit true variation in the color inventories of languages. But it would also be a possible argument to claim that color inventories are unlikely to vary so much between dialects of the same language (or closely related languages in a subgroup), so unrecorded colors are probably omissions from data collection rather than genuine absences from the language.

We suspect that some terms were not recorded because of the linguists’ expectations about what items are present (or not) in a language. For example, Australian languages are stereotypically claimed to lack color terms beyond black, white, red, and yellow; this can lead researchers not to ask for terms like blue or purple.

Finally, data for this paper came from the Chirila database (Bowern 2016), which while extensive (800,000+ items), is by no means exhaustive. Nash brought to our attention several cases where color terms had been recorded in sources which are not in Chirila. These are also noted in the revised supplementary table and reflected in the newly uploaded analysis files.

In order to assess the impact of our coding decisions, as well as the impact of terms which were missing from Chirila and hence recorded as absent from the languages, we re-ran all analyses. We ran two sets of updated analyses. One simply corrected errors resulting from data missing from Chirila. The other also used Nash’s alternative judgments about presence/absence of color terms like ‘green’. In neither case were our main conclusions affected. That is, we still find support for both color gain and color loss. While, as is expected, the numerical values of individual results changed somewhat, our inferences and conclusions stand. Color loss is possible (under this model), though it’s substantially less common than color gain.

I am currently working on a new update to Chirila and many of these revised sources will be available there.

Filed under: Chirila, Historical, Pama-Nyungan
Source: Anggarrgoon

New preprint (etc) archive for Australian languages!

I worry about data. It’s my job. I worry about how to analyze it, how to collect it, how to present it, and what happens to it. A particular worry for me at the moment is the very large amount of ‘grey’ publications for Australian language: that is, the language materials that are published locally, for example, by language centres or smaller publishers. There are also gems in working papers collections, some of which only exist in photocopies of photocopies at this stage. Some important work has come out of Hono(u)rs Theses, but that work isn’t often widely available, and unlike PhD theses, it tends not to make its way to university repositories. I have a large collection of such materials, both in print and in scanned format, and I presume that others do too, particularly the “older generation” of Australianists who did most of their work before putting stuff on the web was what one did as a matter of course.

Another area of accessibility in work on Australian languages is fire-walled publications (or subscription-only publications). There is an increasing attention to Open Access, but for various reasons, much work is either print-only or e-print but behind a firewall. But in many cases, authors are able to upload freely available preprints.

It’s important to make our work available to the many groups who are interested in language: to our linguist colleagues, to the wider scientific community, to the general public, and in particular to members of the Aboriginal community.

So, I’ve started a ‘community’ on Zenodo for Australian languages. Zenodo is an archive platform for sharing research. In a nutshell, you upload your paper, handout, or other item, give the site some information about the work’s metadata, and publish it. You choose a license to share your work under (it can be closed (archived), for example), upload the file(s), and presto!

Zenodo is somewhat similar to and, in that it takes work and makes it available. However, there are also a couple of big differences. Both and researchgate are for profit, while Zenodo is not for profit, and funded by CERN and programs in the EU’s Open Science Initiative. Zenodo uploads are publications, while the others’ aren’t. Zenodo assigns DOIs, allowing for referencing versions of publications (which makes it great for databases or dictionaries or other work that might have multiple editions or versions). It also lets you upload collections of files as a single item (which the others don’t), and it works well with code repositories like github, so you can publish the paper, supporting documentation, and code at the same time. If you have sound files, you can include them with the paper under the same DOI (which you can’t do on academia, for example)

Another issue is findability – in theory, everything on the web is ‘findable’ if you know what to search for. Search engines, however, optimize results, weighting results from different places differentially. I know from the experience of finding papers for ozpapers that it can be hard to find work on Australian languages, even when I have regular alerts set up. For example, not all university thesis repositories show up in google alerts (you have to know what to look for)

To contribute to Zenodo, go to

You’ll see a list of current contributions and a button to upload.

If you have old handouts, or other useful information about Australian languages, that you would like to contribute but do not have the time/inclination to upload them, if you can get me the scans (or even paper copies), my students and I will upload them for you.


Filed under: Chirila, fieldwork, language documentation
Source: Anggarrgoon

New bootcamp under way!

The 2017 grammar boot camp starts tomorrow. Three students (with bios below) will be working with me on materials for Noongar. We’re very lucky to be working with Denise Smith-Ali, Noongar linguist, and Sue Hanson from the Goldfields Language Centre. Our main focus for the month is to put together a phonological description of Noongar, with sound files to illustrate what we are describing. In some ways, this is pretty straightforward (in that it’s the sort of thing linguists do, the scope is known, etc) but in other ways, it’ll be a challenge! For example, we want to make something easy to access, and easy to edit and update. We’ll be posting more about this as we make decisions.

Akshay Aitha: Akshay is a rising senior at UC Berkeley working on a double major in Linguistics and Applied Mathematics (with a concentration in Logic). My main research interest at the moment is the functional structure of nominals, especially in my heritage language, Telugu. I also have a strong enthusiasm for linguistic fieldwork. Outside of my coursework, I’ve been involved as a research assistant on various phonetics and fieldwork projects under graduate students in the Berkeley Linguistics department, and I’m also involved in my department as an officer of our club for undergraduates, SLUgS.

Lydia Ding: Lydia is a recent graduate of Carleton College, where she majored in Linguistics and completed a senior thesis for distinction on wh-questions in Nukuoro [nkr] (Polynesian). Her primary interests lie in language documentation, syntax, morphology, and computational linguistics.

Sarah Mihuc: Sarah is a recent graduate of McGill University with a BA Honours in Linguistics & Computer Science. She works on anti-agreement and on word order in Kabyle Berber. She also has experience in experimental and computational linguistics, and fieldwork on two Mayan languages.

Filed under: Chirila, Dialects, fieldwork, language documentation, Media, Pama-Nyungan
Source: Anggarrgoon

Teaching statement

I’ve finally figured out what I want to put in a teaching statement:

I am a linguist and I teach about linguistics, particularly language change and language documentation. My teaching is research centered in that I want my classes, from freshman classes to graduate seminars, to be places where my students learn how to ‘figure stuff out’ – how to step outside their starting assumptions to figure out what language tells us about how our world works, how to find out what they don’t know, even when they think they know it, and how to be constructive critics of their own and others’ work. I want them to be excited about learning and not to see the syllabus as simply a set of hoops to go through to earn a grade. In short, I teach students how to think, not what to think.

If language were spoken in a vacuum, my teaching statement could probably end there, vague though it is. But language is spoken by humans and researched by humans, and humans are complex. Views about language, from the appropriateness of teaching spelling, to when to introduce a second language, to who should be bilingual, to who speaks better than others, pervade our lives. They affect the type of data that linguists can use, and more concretely, they directly affect the lived experience of a large fraction of the population, for better or for worse.

Linguists can, and should, have a lot to say about this. Our commitment to the ‘scientific’ study of language has implications, both for how to study social dynamics, and the ways in which language is used to reinforce or deny power. Our work as academics gives us tools to critically examine social constructs, to separate the content of claims about the world from the language used to deliver those claims, and to see the implications of such arguments.

My practical focus in this lab is on a combination of educational outreach and training, and the commitments that this entails. Quite simply, students need to be able to do the best work they can in my classes and research group, and if they can’t because they are systematically disadvantaged, that’s not just their problem, it’s my problem too.

How does this translate into concrete activities? For me, this means a twin focus on the broader impacts of training current and future researchers, and of making our methods, results, and approaches more available to others.

Within the lab and classroom, it means fostering an atmosphere of excellence and respect, where everyone’s contributions are acknowledged and valued. It means acknowledging the realities of implicit bias and how it can affect both our work and our perceptions of excellence. It means acknowledging and leaving time to explore history in the classroom.

For training, it means working from a broad definition of ‘excellence’ that factors in opportunity and potential as well as results achieved to date. It means recognizing that ‘pipeline’ questions won’t solve themselves without effort.

For activities, it means a genuine commitment to outreach. This includes making sure language materials are accessible to the people who need them, that we preferentially publish in open access journals, that we provide plain English summaries of our work, that the results of our work are integrated into general outlets such as Wikipedia, and that we help people who want to learn about linguistics and don’t have the resources to do so. It means not just an informational role, but an advocacy role for topics where our research is relevant, such as language endangerment.

Filed under: Other, teaching
Source: Anggarrgoon