Is there a list or some other way to find reusable documentation?

Good evening from Western Mass, local heroes.

I have a question for you:

Do you know of any way to find documentation projects that are under an open license?

Ideally, I’m looking for documentation that:

  • Includes time-aligned transcription and accompanying audio or video
  • Is under a license (maybe Creative Commons?) or other terms that allow re-use for educational, non-commercial purposes

I realize that of course, most projects contain some content that is under such a license and others that are not. I also realize that some archives provide such metadata (although many don’t), so it is probably possible to search through archives to find content that meets these criteria, and that’s what I plan to do. But I’m interested in any pointers that this community might have either about your own work, work you know about, or third-party papers, indexes, whatevs.

The point of all this is that I want to use real fieldwork for some educational materials I’m working on.

I bow gratefullly in advance, just like this here emoji:


Oh, and I should add that I will certainly make a simple database of whatever comes up available, assuming I’m not missing a large existing index.

(Pinging @laureng, @msatokotsubi, @Andrew_Harvey, @xrotwng, @joeylovestrand as some local heroes who might know of such things…)


Hi Pat – virtually all my stuff is under a Creative Commons — Attribution-ShareAlike 4.0 International — CC BY-SA 4.0 status in all but name.

Here are the links to the collections:
(and with @rgriscom) Hadza:

In order to access recordings with time-aligned transcription, each of these deposits has a keyword (consult the “Refine Your Selection” bar to the left of the screen) that looks something like “Deposit# Workflow status: standaradised X transcription, English translation” or “Workflow status […]”. If you click these filters, this should give you all the stuff in those collections that you’re looking for.

If this doesn’t give you the results you’re looking for, just get in touch with me and I can help. I want to have this stuff used as widely as possible, so this kind of request is super welcome :smiley:

1 Like

Looking around Zenodo, I came across files uploaded by Christian Döhler which appear to be open access, and downloadable without even having to register.


This is a bit off the main topic of the thread, but I think this is brilliant – do we know of anyone explicitly teaching how (or encouraging) to archive language materials via Zenodo or OSF?
As far as I can tell, this would be a viable DIY alternative if people can’t get stuff in the larger archives.
Can we think of any drawbacks in doing it this way?

1 Like

:thinking: There may be some concern with making things too accessible, or impossible to track who accessed the data (hence, the usual registration protocol)? But it looks like Zenodo allows you to archive and require users to ask permission to access.

Discoverability is possible an issue? Christian found a nice way to use the communities feature to organize his corpus, so that helps. I don’t think any Zenodo stuff gets fed to OLAC or similar, so it’s a bit hard to search in the usual places

What about sustainability? Do we know who funds Zenodo and how many decades they’ll be around for?

1 Like

Yeah, this kind of thing is what I was thinking about with regard to “split” archives where only a few (maybe even one or two!) bundles are tagged as truly accessible. A single sizeable time-aligned text, after all, is a significant resource for lots of different kinds of research and learning.

I find Creative Commons useful because it is an explicit positive statement: “Yes, you can re-use this content with the following restrictions…”.

I have in mind content where this isn’t an expectation (by which I don’t mean to imply, of course, that such an access regime is not sometimes wholely appropriate).

Huh, interesting. I feel like this forum could really use a Zenodo 101 post. I myself have never gotten around to really taking the time to try to understand it but I know I should!

Yeah I was looking at OLAC but it doesn’t seem to be possible to filter by reuse? For instance, I don’t see any indication that the Gorwaa listings in OLAC for Andrew’s work indicate that it’s under an open license?

Indeed, another reason for a Zenodo 101 topic…

Okay, such a topic needs to happen. :rofl:

(Incidentally, I believe @cbowern is a Zenodo person as well, though I don’t know if that’s in a “documentation proper” capacity or for other content.)

In any case, this discussion is reminding me of another one we had here quite some time ago:

Another kind of endangered documentation

1 Like

@Andrew_Harvey my friend, this is the thing. I’ll be in touch!


1 Like

If anyone(s) out there wants to put together a “Zenodo for Language Documentation” presentation, I’d be more than happy to host that talk as a SOAS Webinar!


I think this was the dream of OLAC ( the Open Language Archives Community) but it stopped being funded/developed a long time ago now:


Yeah. OLAC is still useful of course. It I couldn’t find a way in there to filter by reuse. The other problem I have with OLAC is that it’s hard to tell what’s pointing at an online resource and what’s more traditional bibliographic info.

(I often wonder about the back stories behind big efforts like this. The whole GOLD Ontology thing is another one I wonder about. How did they start? Why did they end?)


I can tell you a bit about the GOLD ontology background sometime, I was part of some of the original discussions (though not centrally involved) - my impression is that it was largely driven by the Aristars.

1 Like

If I remember right, Zenodo doesn’t guarantee preservation in perpetuity, but for 20 years (from when I can’t remember). It’s also extremely difficult to browse and search there, unless you know the collection direct link.


@pathall concerning the license issue I have some comments:

  1. what is an Open License to you… what qualifies and what doesn’t?
  2. You can use OLAC metadata and look for resources by license if you use the faceted search interface at OLAC Language Resource Catalog

@laureng I’m not sure this is an accurate assessment of OLAC. Penn Libraries still sponsors the web hosting. So being “funded” it still is. However, the actual original community of linguists who got together to draft recommendations is no-longer actively drafting recommendations. This is not because they can’t but rather is because no-one has called a meeting to do it. OLAC’s current biggest supporters can often be found in the DELEMAN network. My understanding from discussions with Dr. Bird and Dr. Thieberger is that there is some concerted effort this summer starting in July 2022. OLAC is a topic of calls for papers at the joint ELAR/PARADISEC conference later this year So, while the OALC community is far from DEAD, how well it is thriving is a matter of perspective.

1 Like

@cbowern FYI DoBeS does not guarantee to hold the data for but a mere 50 years.

1 Like

A post was split to a new topic: Of GOLD and Ontologies…