Thanks, Marti, for this new coffee hour topic! I just realized that this is actually during the same time as the writing group today I think I probably need to stick with the writing group because I’ve got a couple of deadlines this week, but I’ve also got lots to say about metadata. Here are some quick comments:
Here is the latest metadata editing tool being promoted by ELAR: Lameta
IMO Lameta is much easier to use than Arbil and has some added features which make it more compelling as a general project management app rather than simply the app you use to prepare your data for archiving. Neither Arbil nor Lameta have the ability to easily import metadata from a spreadsheet/CSV format, though, which is what causes so many linguists to waste so many hours re-entering metadata when it comes time to archive. If you use Lameta from the beginning, then this is not a problem, so if you don’t want to learn how to program then I would suggest switching to using Lameta for metadata creation as soon as possible in order to minimize the amount of metadata you have to manually re-create.
In terms of spreadsheet metadata, which is how many linguists store their metadata, it is a good idea to follow the principles of tidy data. This makes it possible to easily manipulate the data later and automatically convert it to a format used by other software such as Lameta.
Mobile metadata entry systems such as ODK or KoBoToolbox allow you to easily work with teams of data collectors in areas without an internet connection and track data collection progress remotely. I’m hoping to host a workshop sometime this year on how to setup a KoBoToolbox metadata system, so if anyone is interested please let me know. These platforms can export metadata in CSV/XML. Here is a GitHub repo I made with some form templates you can use with KoBoToolbox/ODK, and here is a script I made to convert the CSV output of KoBoToolbox to the XML format used by Lameta.
Also, some advice after having gone through multiple projects’ worth of metadata this past week: the two primary categories of metadata produced by linguists when creating new data in a field-like setting are:
- Session/recording metadata
- Participant metadata
This is reflected e.g. in Lameta’s two tabs “Sessions” and “People”. If you are working with spreadsheets, then that means that you will often have two different spreadsheets for these two categories of information. This allows you to enter the participant info only once, rather than repeating it every time they participate in the creation of a resource. From a data perspective this means that we are actually creating something like a very basic relational database, whereby the session data is linked to the participant metadata by the name of the participant. For this reason, it is very important that the participant names in the session metadata are 100% the same as the participant names used in the participant metadata, especially if you want to automate the conversion from one format to another rather than enter all of the data manually a second time.