Large text file versioning/ Chirila update

I started unentwining Chirila from the linked filemaker databases today. Some things turned out to be easier than expected. The vast majority of links between databases, for example, are only there 1) for specific (completed) research projects), 2) to count things, or 3) because of the way I set up the databases as linked but distinct database files rather than tables within a single file. Turns out there are only 5 distinct tables that really need to be there, and no more than 2 chains of tables from the main lexical information.

I was hoping to use github for version of the text files. I’m not sure this will be possible. We will likely run into the 100mb limit for file sizes, unless the files are broken up somewhat or compressed. Given that things will need to be stitched together in any case, there is some argument for breaking them up (e.g. storing them with fewer columns than the full array).

I also spent some quality time with filemaker exports. text export does not export the field names (I knew already). However, filemaker’s “merge” type does export the field names (which I didn’t know). However, “merge” cannot export unicode (wtaf??), so we are back to either:

. exporting in text format and stitching the headers in
. exporting in excel format and then saving as a text file outside of filemaker (e.g. as part of the R scripts that will be needed anyway.

Can’t say I like either of these alternatives very much but this part of the export will not be needed often (since the text files will be the reference/original format).

2 Likes

I had some success scraping data from filemaker’s XML export - including parsing the filemaker rich-text markup. But that was a couple of years ago, so I don’t know how much of an option that still is today.

2 Likes

To be clear, you could use git without github. You will not run into the github specific file limits that way.

After all the articles in the early 2000s talking about how to set up databases and use filemaker pro for language research it would be great to see one talking about the journey out of filemaker and into…

1 Like

Good point. Jira/Bitbucket has much larger file limits (I do need to include file sharing since the whole point of this conversion is so others apart from me can work with the files)

Something else to watch out for is that bitbucket can operate with one of two different version control systems the first is https://www.mercurial-scm.org which is what they started with. I do recall that more recently they did add git as an option. I watched a presentation by Linus once on YouTube arguing the superior nature of git over other version control system, but I have since forgotten the details.

Also for citation and referencing purposes having a DOI is useful. Zenodo does have the API link to github that can pull a product in at a given tag release and given a DOI. Another product I’ve uses is gitlab.com. But I don’t think that Zenodo has the automatic integration with gitlab or bitbucket. However, OSF.io does have integration with github, bitbucket, and gitlab. Connect GitHub to a Project - OSF Support

I have found the following two blog-posts thought provoking in my usage of online version control systems: