It occurred to me that it would be interesting to see updates from languge documentation archives (AILLA, ELAR, PARADISEC, etc) when a new deposit is made available, for several reasons:
We should celebrate our colleaguesâ accomplishments!
We should help being attention to newly documented languages
We should try to learn from how recent archival repositories are put together
I know announcements of this kind go up on Twitter and blogs and mailing lists and stuff, but I figured it might be fun to try to work together to build a little list ourselves.
Iâll see if I can find a few to add below, feel free to add to this list.
1 Like
So I spent some time on this today and honestly, I didnât find a whole lot. The only announcements I have seen are via Twitter â some archives have blogs, but often those are used more for announcements (conferences, calls for papers, grant news, etc) than for changes to the archives themselves.
When you step back and think about it, itâs kind weird. Wouldnât one expect archives to be highlighting deposits? Or am I missing stuff?
Had to think about this for a minute, but it seems like OLAC should have this information, and if not a feed, there should be a way to get recent update via search, as in:
http://dla.library.upenn.edu/dla/olac/search.html?sort=last_update_sort%20desc&showall=sort&fq=dcmi_type_facet%3A"Collection"
1 Like
@joeylovestrand 's solution should work. But you can also âcut out the middlemanâ, i.e. query the data that OLAC queries as well: An archiveâs OAI-PMH data provider. OAI-PMH allows specifying a from
parameter for the ListRecords
verb, so all PARADISEC records from 2022 are https://catalog.paradisec.org.au/oai/item?verb=ListRecords&from=2022-01-01&metadataPrefix=olac
2 Likes
Oh and an archiveâs OAI-PMH âend pointâ is listed as âBase URLâ on OLACâs archive details page, e.g. OLAC - Archive details
1 Like
Iâd still say that an OAI-PMH data provider isnât exactly âhighlighting depositsâ 
1 Like
Huh, interesting, thanks @joeylovestrand!
FWIW I did find a feed link in there (RSS):
http://dla.library.upenn.edu/dla/olac/feeds/search.rss?sort=last_update_sort%20desc&showall=sort&fq=dcmi_type_facet%3A"Collection"&
Nice. That makes it pretty easy to generate an HTML
page like the one on OLAC dynamically. I wrote a crude little Deno script to do that:
Script to convert OLAC feed into HTML
import { DOMParser, Element } from "https://deno.land/x/deno_dom/deno-dom-wasm.ts";
let url = `http://dla.library.upenn.edu/dla/olac/feeds/search.rss?sort=last_update_sort%20desc&showall=sort&fq=dcmi_type_facet%3A%22Collection%22&`
let response = await fetch(url)
let xml = await response.text()
let dom = new DOMParser().parseFromString(xml,'text/html')
let links = Array.from(dom.querySelectorAll('item'))
.map(item => {
let link = item.querySelector('link').textContent
let title = item.querySelector('title').textContent
let description = item.querySelector('description').textContent || ""
return {link, title, description}
})
let page = `<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Recent Language Archive Deposits</title>
</head>
<body>
<h1>Recent Language Archive Deposits</h1>
<ul>
${links.map(link => `<li><a href="${link.link}">${link.title}</a> ${link.description}</li>`)
.join('\n')}
</ul>
</body>
</html>`
Deno.writeTextFileSync('archive-feed.html', page)
Crude, but it does what it says on the tin:
http://docling.net/archive-feed.html
Obviously this is sort of pointless given that the page is already online with that exact information; but XML
is much easier to parse than HTML
. Maybe, for instance, we could figure out a way to publish this feed to this forum automatically.
Man, itâs exciting to have so much expertise in the room. 
I confess I have never dug into the OLAC
docs, and I should have â the URL you link provides more granular data, which could be useful. Considering just the first record:
<record xmlns="http://www.openarchives.org/OAI/2.0/">
<header>
<identifier>oai:paradisec.org.au:AC1-220</identifier>
<datestamp>2022-02-09T22:26:10Z</datestamp>
</header>
<metadata>
<olac:olac xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:olac="http://www.language-archives.org/OLAC/1.1/" xsi:schemaLocation="
 http://www.openarchives.org/OAI/2.0/oai_dc/
 http://www.openarchives.org/OAI/2.0/oai_dc.xsd
 http://purl.org/dc/elements/1.1/
 http://dublincore.org/schemas/xmls/qdc/2006/01/06/dc.xsd
 http://purl.org/dc/terms/
 http://www.language-archives.org/OLAC/1.1/dcterms.xsd
 http://www.language-archives.org/OLAC/1.1/
 http://www.language-archives.org/OLAC/1.1/olac.xsd
 ">
<dc:title>Revepe (Holvanua), Maewo 'Prodigal Son'; Baiap (Ambrym) Word List.</dc:title>
<dc:identifier>AC1-220</dc:identifier>
<dc:identifier xsi:type="dcterms:URI">http://catalog.paradisec.org.au/repository/AC1/220</dc:identifier>
<dc:subject xsi:type="olac:linguistic-field" olac:code="language_documentation"/>
<dcterms:created xsi:type="dcterms:W3CDTF">1970-01-01</dcterms:created>
<dc:date xsi:type="dcterms:W3CDTF">1970-01-01</dc:date>
<dcterms:tableOfContents xsi:type="dcterms:URI">http://catalog.paradisec.org.au/repository/AC1/220/AC1-220-IMG_01.tif</dcterms:tableOfContents>
<dcterms:tableOfContents xsi:type="dcterms:URI">http://catalog.paradisec.org.au/repository/AC1/220/AC1-220-IMG_01.jpg</dcterms:tableOfContents>
<dcterms:tableOfContents xsi:type="dcterms:URI">http://catalog.paradisec.org.au/repository/AC1/220/AC1-220-IMG_03.tif</dcterms:tableOfContents>
<dcterms:tableOfContents xsi:type="dcterms:URI">http://catalog.paradisec.org.au/repository/AC1/220/AC1-220-IMG_03.jpg</dcterms:tableOfContents>
<dcterms:tableOfContents xsi:type="dcterms:URI">http://catalog.paradisec.org.au/repository/AC1/220/AC1-220-IMG_05.tif</dcterms:tableOfContents>
<dcterms:tableOfContents xsi:type="dcterms:URI">http://catalog.paradisec.org.au/repository/AC1/220/AC1-220-IMG_05.jpg</dcterms:tableOfContents>
<dcterms:tableOfContents xsi:type="dcterms:URI">http://catalog.paradisec.org.au/repository/AC1/220/AC1-220-IMG_04.tif</dcterms:tableOfContents>
<dcterms:tableOfContents xsi:type="dcterms:URI">http://catalog.paradisec.org.au/repository/AC1/220/AC1-220-IMG_04.jpg</dcterms:tableOfContents>
<dcterms:tableOfContents xsi:type="dcterms:URI">http://catalog.paradisec.org.au/repository/AC1/220/AC1-220-IMG_02.tif</dcterms:tableOfContents>
<dcterms:tableOfContents xsi:type="dcterms:URI">http://catalog.paradisec.org.au/repository/AC1/220/AC1-220-IMG_02.jpg</dcterms:tableOfContents>
<dcterms:tableOfContents xsi:type="dcterms:URI">http://catalog.paradisec.org.au/repository/AC1/220/AC1-220-A.wav</dcterms:tableOfContents>
<dcterms:tableOfContents xsi:type="dcterms:URI">http://catalog.paradisec.org.au/repository/AC1/220/AC1-220-A.mp3</dcterms:tableOfContents>
<dcterms:tableOfContents xsi:type="dcterms:URI">http://catalog.paradisec.org.au/repository/AC1/220/AC1-220-A.eaf</dcterms:tableOfContents>
<dc:contributor xsi:type="olac:role" olac:code="compiler">Arthur Capell</dc:contributor>
<dc:contributor xsi:type="olac:role" olac:code="recorder">Arthur Capell</dc:contributor>
<dc:subject xsi:type="olac:language" olac:code="bpa"/>
<dc:subject xsi:type="olac:language" olac:code="mwo"/>
<dc:subject xsi:type="olac:language" olac:code="pgk"/>
<dc:language xsi:type="olac:language" olac:code="bpa"/>
<dc:language xsi:type="olac:language" olac:code="mwo"/>
<dc:language xsi:type="olac:language" olac:code="pgk"/>
<dc:format>Digitised: yes
Media: LR Audio-tape Type 961. Plastic spool. No tape lead-in. Good condition.
Audio Notes: Operator: Nicholas Fowler-Gilmore
Tape Machine: StuderA810
Soundcard: RME HDSPe AIO
A/D Converter: DAD2402
File: 24bit96kHz, Stereo
Speed: 3.75ips
Listening Quality: Good. </dc:format>
<dc:coverage xsi:type="dcterms:ISO3166">VU</dc:coverage>
<dc:coverage xsi:type="dcterms:Box">northlimit=-15.026; southlimit=-16.312; westlimit=167.614; eastlimit=168.165</dc:coverage>
<dc:type xsi:type="olac:linguistic-type" olac:code="primary_text"/>
<dc:subject xsi:type="olac:linguistic-field" olac:code="text_and_corpus_linguistics"/>
<dc:type xsi:type="dcterms:DCMIType">Sound</dc:type>
<dcterms:accessRights>Open (subject to agreeing to PDSC access conditions)</dcterms:accessRights>
<dc:rights>Open (subject to agreeing to PDSC access conditions)</dc:rights>
<dcterms:bibliographicCitation>Arthur Capell (collector), Arthur Capell (recorder), 1970. Revepe (Holvanua), Maewo 'Prodigal Son'; Baiap (Ambrym) Word List.. TIFF/JPEG/X-WAV/MPEG/XML. AC1-220 at catalog.paradisec.org.au. https://dx.doi.org/10.4225/72/56E97D93249EF</dcterms:bibliographicCitation>
<dc:description>Audit of file (20220210) suggests only two languages on this recording, perhaps Rerep (Malekula) and Baiap (at 26:34) . Marked Side 1/2 on box, but on tape, side 1. is identified as side 2. -- Side 1: Revepe (Holvanua), Maewo 'Prodigal Son' - The first is Retep or Pangkumu, an Austronesian dialect of East Malekula, Vanuatu; Maewo is an island much further north. -- Side 2: Baiap (Ambrym) Word List - Dialect of the Ambryn Island Austronesian language Dakaka, Central Vanuatu.
(no side b). Language as given: Revepe (Holvanua), Maewo, Baiap (Ambrym)</dc:description>
</olac:olac>
</metadata>
</record>
So from there we can get to this bit:
Revepe (Holvanua), Maewo 'Prodigal Son'; Baiap (Ambrym) Word List.
Which is informative but unfortunately not really structured: itâs not clear to me what this means â presumably Revepe is a speaker, and Holvanua a⊠place? Or is Maewo Revepe a personâs name, maybe? Etc.
Still, it would be useful to someone who is a specialist in this area to be informed of this data.
As in many other cases with linguistic data there seems to be a lack of transparent re-use cases. OLAC doesnât seem to be used very systematically by many, and the OAI-PMH data from archives is probably only used by OLAC, so thereâs not much feedback on its usability either.
But as I said elsewhere, more people in linguistics in both roles - data creators and data users (also data of others) - could be the way out of this dilemma.
Neat. This the <select>
on that page turns up something thatâs interesting in its own right, a listing of language archives, putting it here for the heck of itâŠ
- Aboriginal Studies Electronic Data Archive (ASEDA)
- Academia Sinica Collections
- AfBo: A world-wide survey of affix borrowing
- African Language Materials Archive
- Alaska Native Language Archive
- APiCS Online
- Archive of the Indigenous Languages of Latin America (AILLA)
- BAS Repository
- Câekâaedi Hwnax Ahtna Regional Linguistic and Ethnographic Archive
- California Language Archive
- Central Institute of Indian Languages: Publications
- CHILDES Data repository
- COllections de COrpus Oraux Numeriques (CoCoON ex-CRDO)
- Comparative Corpus of Spoken Portuguese
- The CrĂÂșbadĂÂĄn Project
- Dictionaria
- A Digital Archive of Research Papers in Computational Linguistics
- ELRA Catalogue of Language Resources
- Endangered Languages Archive
- Ethnologue: Languages of the World
- Eurac Research CLARIN Centre
- Glottolog 4.5
- Graduate Institute of Applied Linguistics Library
- ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics âA. Zampolliâ, National Research Council, in Pisa
- IULA UPF OAI Archive
- Kaipuleohone
- The Language Archive
- Language Commons Language Corpora
- Language Documentation and Conservation
- Language resources at the Text Laboratory
- LAPSyD
- The LDC Corpus Catalog
- LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ĂĆĄFAL), Faculty of Mathematics and Physics, Charles University
- The LINGUIST List Language Resources
- Living Archive of Aboriginal Languages
- Lund University Humanities Lab corpusserver
- Magoria Booksâ Carib and Romani Archive
- Multimodal Learning and teaching Corpora Exchange
- The Natural Language Software Registry
- ODIN - The Online Database of Interlinear Text
- Oxford Text Archive
- Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC)
- Pacific Collection at the University of Hawaiâi at MĂÂnoa Hamilton Library
- PHOIBLE 2.0
- POLLEX-Online
- The Rosetta Project: A Long Now Foundation
Library of Human Language
- SAILS Online
- SIL Language and Culture Archives
- Slovenian language resource repository CLARIN.SI
- The Sociolinguistic Archive and Analysis Project (SLAAP)
- Speech and Language Data Repository (SLDR/ORTOLANG)
- Surrey Morphology Group Databases
- TALKBANK Data repository
- Tibetan and Himalayan Digital Library
- transnewguinea.org
- TST-Centrale
- The Typological Database Project
- U Bielefeld Language Archive
- WALS Online
- WALS Online RefDB
- Webonary Sites
- WOLD
Before you get too excited, also check the last column in the table here Open Language Archives Community and the âCurrent as ofâ date on details pages like OLAC - Archive details
1 Like
Oh Iâm never excited, Iâm very blasĂ©. 
You mean the fact that so many archives are inactive?
Wait, I do get excited. 
Random note:
Ian Maddieson & companyâs LAPSyD phonological typology database has a ârecent updatesâ sidebar (thought itâs not a feed).
But that does turn up recent work â Dahalo was updated a few days ago:
I donât know if the archives are inactive. Often, Iâd guess, the OAI-PMH interface may just be neglected - possibly because OLAC is considered irrelevant? Unfortunately, OAI-PMH is a protocol thatâs somewhat cumbersome - but thereâs a cheap way to support it called static repository gateway, which Iâd guess a couple of the archives are using. You basically just put a file somewhere on a server. Thatâs cheap, and ⊠easy to forget about.
I know that Iâm in charge of roughly 25% of the listed archives that could be crawled successfully. And with the exception of Glottolog data in the others rarely changes, and if you wanted to know, youâd rather check CLDF Datasets · GitHub for activity âŠ
1 Like
Oh, speaking of CLDF Datasets · GitHub : Released versions of these datasets are archived with Zenodo and appear in its cldf-datasets community - which has an OAI-PMH feed: https://zenodo.org/oai2d?verb=ListRecords&set=user-cldf-datasets&metadataPrefix=oai_dc 
1 Like