Of interest, via the SIGMORPHON list and GitHub:
SIGMORPHON 2023 Shared Task on Interlinear Glossing
We are organizing a shared task on automated interlinear glossing (IGT) in the 2023 workshop of the ACL Special Interest Group on Computational Morphology and Phonology (SIGMORPHON).
Interlinear glossed text is a major annotated datatype produced in the course of linguistic fieldwork. For many low-resource languages, this is the only form of annotated data that is available for NLP work. Creation of glossed text is, however, a laborious endeavor and this shared task investigates methods to (fully or partially) automate the process.
In this task, participants build systems which generate morpheme-level grammatical descriptions of input sentences following the Leipzig glossing conventions. The input to the glossing system consists of (1) a sentence in the target language and (2) a translation of the target sentence into a language of wider communication, often English. More details are available on our task repo: GitHub - sigmorphon/2023glossingST: A repo for the 2023 Sigmorphon glossing shared task.
In the task repo, we provide the code, results, and downloadable trained models for our baseline system. We also provide the evaluation script and other helpful scripts for loading and processing the task data, to facilitate easy building of novel systems.
In this task, there are two tracks that participants may participate in. In the closed track, systems are trained solely on input sentences and glosses. In the open track, systems may additionally make use of morphological segmentations during training time. In the open track, participants may additionally use any data and resources (including dictionaries and pretrained language models). The only exception is additional interlinear glossed data which is not allowed. For the open track, we also provide some extra information like POS tags for a subset of the languages.
The main evaluation metric for the competition is token accuracy. Systems are evaluated w.r.t. generation of fully glossed tokens (chiens → dog-PL). We will also separately evaluate glossing accuracy on bound morphemes like PL and free morphemes, i.e. stems, like dog.
Important Dates and Deadlines
- April 1: Release of surprise language training and development data
- April 24: Release of test data for all languages
- April 27: Test predictions should be submitted to organizers
- May 15: System description paper submission deadline
- May 25: Notification of paper acceptance
- May 30: Camera ready deadline for system description papers
- Michael Ginn (University of Colorado)
- Mans Hulden (University of Colorado)
- Sarah Moeller (University of Florida)
- Garrett Nicolai (University of British Columbia)
- Alexis Palmer (University of Colorado)
- Miikka Silfverberg (University of British Columbia)
- Anna Stacey (University of British Columbia)
By the way, @SarahRMoeller and @alexispalmer are local heroes and I suspect might be willing to field questions.