What can I download?
There are several different types of downloads.
- Individual language files in Excel and tab-delimited format. If you just want the data from one or two languages, this is the page you want.
- Swadesh lists (basic vocabulary) for the phase-I release languages. Use this if you want to do comparative research on a small set of Australian vocabulary. If you want to combine this with other resources for your own research, that’s fine, but you can’t then re-release it without my permission, and without acknowledgement of both this database and the original sources.
- The complete Phase I data, in tab-delimited or xlsx format. Use this file if you want to do comparative research on Australian languages.
- A filemaker Pro v13 version of the database with links between the language, source, reconstruction, and data tables.
- There is also a list of languages (with other data, such as location, ISO and Glottolog codes) and the sources for the database, both just phase I released data, and the full list of names for Australian languages (including languages for which we do not hold data). The language list is a 1) list of standard language names, and 2) a list of variety names.
What if I find an error?
Please let me know! We’ve done or best, but there are so many different places where errors might be introduced, it’s almost impossible to have a totally error-free resource. You can give feedback through pamanyungan.net/errata-and-feature-suggestions. Suggestions for features are also fine.
I want to create my own relational database from the text tables, how do I do that?
- Link source_id in the word table with source_id in the sources table.
- Link variety_id in the word table with variety_id in the language table.
- Link gloss_id in the word table with gloss_id in the gloss table.
As mentioned above, you can download a filemaker database with the links set up already. This filemaker version is a little different from the tab-delimited downloads, in that it has standard languages and varieties in separate tables. Otherwise, the material is the same.
Here are some caveats to keep in mind:
- Standardized glosses are incomplete. This is a priority for the future, and working out how to clean this up and make more progress in adding glosses is currently in progress.
- Reconstructions are not included in the current release. They will be included in the next one.
- Part of speech marking is as per the original sources. They have not yet been standardized, but that will be done for the next round of data release (in a new field).
- About 7% of varieties are not associated with a standard language. This research is on-going.
- Other general data cleanup is an on-going task.