This masterclass will show how computational methods can be used to amplify the efforts of multivariate linguistic typology to answer core CoEDL questions of how and why languages differ.
The course will take us through the typological data cycle and show how we can use computational methods to collect, manage, analyse and share typological data. We will see how current methods from data science allow us to increase the quantity and quality of our typological data as well as make it available to other researchers in online repositories. We will see how to build typologies using modern typological frameworks and how to develop these into explicit linguistic models in order in order to classify our data with precision, speed and detail.
The course will focus on the typology of morphological exponence but will be generalisable to any linguistic domain.
It assumes no prior knowledge of programming.
Originally presented as part of the Australian Linguistic Society CoEDL Masterclasses 2021
Here’s Matt’s overview of the course:
- Multivariate/Canonical typology
- Motivations for computational methods
- I: The datascience workflow, work environments, identifying domains of study
- II: Data acquisition
- III: Datasets and next-generation linguistic typology
- IV: Sharing your data
- Formalising assumptions
- Automating our ‘typologisation’ .
The code and data:
For a quick look at what the course is about you can look at the markdown versions of the Python notebooks (obviously if you really want to follow along, run them!):