Data Sharing History in Linguistics

In their article: Burke, Mary, Hannah Tarver, Mark Edward Phillips, and Oksana Zavalina. 2022. “Using Existing Metadata Standards and Tools for a Digital Language Archive: A Balancing Act.” The Electronic Library ahead-of-print (ahead-of-print). doi:10.1108/EL-02-2022-0028.

Burke et al state the following:

Most data collected by linguists was not traditionally shared, other than through secondary resources (e.g. journal articles, conference presentations, etc.). Source data was collected by researchers and shared within a team of researchers or with individual linguists upon request.

This framing of the narrative suggests that linguists have always been hoarders. I wonder if this is really true. For example, another narrative is possible. That is, that somewhere around the 1970/1980s when the “PhD explosion” (radical increase in awarded PhDs) and the “Publish or Parish” phenomena started to interact that we start to see larger projects in linguistics and also hoarding of “data” as a means of creating publications. I know that when I went to the LSA sociolinguistic corpus workshop around 2011 that some scholars there were unwilling to share their data for fear of being scooped. Their age was a generation older than me fitting the baby-boomer hypothesis. Prior to the 1950 we see lots of collections of anthropological materials (including audio) deposited in archives — these are generally from a different generation of scholars. So, I wonder, is the lack of sharing resources and the lack of archiving resources actually a peculiarity of the baby-boomer generation?

This made me want to know when the publish-or-parish and metric based tenure process started to squeeze candidates. I’m interested in impressions, thoughts, insights, or references here or on my blog: Data Sharing History in Linguistics | The Journeyler