CoCoON for “Digital Oral Corpus COllections” is a technical platform that supports oral resource producers in creating, structuring and archiving their corpuses. A corpus can be composed of recordings (generally audio), possibly accompanied by annotations of these recordings.
The resources are first catalogued and stored, and then archived in the TGIR Huma-Num archive. The author and his institution remain responsible for the deposited materials and can benefit from restricted and secure access to their data, for a defined period of time, if the content of the information is considered sensitive.
That post links to this text resource on Nivaclé, a Matacoan language spoken in Paraguay and Argentina:
It looks quite nice, and it seems quite straightforward to get to annotations of texts from the overview page. It works find in Chrome/Mac, but Firefox didn’t seem to want to play the
.mp4, however, and downloaded it instead. I was also rather surprised to realize that there are only annotations of the Spanish translation, and not the Nivaclé itself — there doesn’t seem to be any metadata in the overview page that indicates that (of course, even text of the translation is better than no text at all).
There is an interactive playback view (click on “Show”) which shows the available annotations, which worked in Chrome:
Hey, I poked around some more and found a text that has more annotations:
This is a text in “Ta’izzi-Adeni Arabic”, and as you can see there are multiple tiers of annotation.
The resources download nice and tidy, like this:
And here’s what the LACITO XML format looks like for a single sentence:
<S id="ACQ_MCSS_NARR_04_chat-borgne_01"> <FORM>ginni hāda</FORM> <AUDIO start="1.87" end="2.84"/> <W> <FORM>ginni</FORM> <M class=""> <TRANSL xml:lanf="fr">Djinn</TRANSL> <FORM>ginni</FORM> </M> </W> <W> <FORM>hāda</FORM> <M class=""> <TRANSL xml:lanf="fr">DEM.M.SG</TRANSL> <FORM>hāza</FORM> </M> </W> <TRANSL xml:lanf="fr">S’agissant de ces ginns.</TRANSL> </S>
All very straightforward. Hooray!