Downloading the CHIRILA database files

What can I download?

There are several different types of downloads.
  • Individual language files in Excel and tab-delimited format. If you just want the data from one or two languages, this is the page you want.
  • Swadesh lists (basic vocabulary) for the phase-I release languages. Use this if you want to do comparative research on a small set of Australian vocabulary. If you want to combine this with other resources for your own research, that’s fine, but you can’t then re-release it without my permission, and without acknowledgement of both this database and the original sources. 

    Chirila-Swadesh Lists (text format) 5.1 mb 17 downloads

    .txt (tab-delimited) formal Swadesh lists for the Chirila Phase I release. ...
    Chirila-Swadesh Lists (Excel)
    Excel format Swadesh lists for languages in the Phase I release. ...
    Size0.00 KB
    Downloaded19 times
    Last UpdatedFebruary 12, 2016

  • The complete Phase I data, in tab-delimited or xlsx format.  Use this file if you want to do comparative research on Australian languages.

    Chirila-FullData-txt 13.9mb 24 downloads

    The full Chirila database, in text (tab delimited) format. ...
  • A filemaker Pro v13 version of the database with links between the language, source, reconstruction, and data tables. 

    Chirila-FullData-Filemaker 22.5mb 11 downloads

    This is the full CHIRILA (Phase I) database, in Filemaker Pro v13.0 format. ...
  • There is also a list of languages (with other data, such as location, ISO and Glottolog codes) and the sources for the database, both just phase I released data, and the full list of names for Australian languages (including languages for which we do not hold data). The language list is a 1) list of standard language names, and 2) a list of variety names. 

    CHIRILA-Language Lists 199.00 KB 21 downloads

    Language and Variety (=doculect) lists in the CHIRILA database, both overall and...

    CHIRILA Source List 0.00 KB 11 downloads

    This is the list of sources for materials in the Phase I database. ...
What if I find an error?
Please let me know! We’ve done or best, but there are so many different places where errors might be introduced, it’s almost impossible to have a totally error-free resource. You can give feedback through Suggestions for features are also fine.
I want to create my own relational database from the text tables, how do I do that?
  • Link source_id in the word table with source_id in the sources table.
  • Link variety_id in the word table with variety_id in the language table.
  • Link gloss_id in the word table with gloss_id in the gloss table.
As mentioned above, you can download a filemaker database with the links set up already. This filemaker version is a little different from the tab-delimited downloads, in that it has standard languages and varieties in separate tables. Otherwise, the material is the same.
Here are some caveats to keep in mind:
  • Standardized glosses are incomplete. This is a priority for the future, and working out how to clean this up and make more progress in adding glosses is currently in progress.
  • Reconstructions are not included in the current release. They will be included in the next one.
  • Part of speech marking is as per the original sources. They have not yet been standardized, but that will be done for the next round of data release (in a new field).
  • About 7% of varieties are not associated with a standard language. This research is on-going.
  • Other general data cleanup is an on-going task.