🧠⛈ Let’s brainstorm features for a hypothetical remote fieldwork application user interface

So, our new world.

The whole field is looking at remote fieldwork for the foreseeable future.

We need new tools, beacuse our standard tools are not networked.

That said, a project to design a whole new class of application for language documentation would be a biiig undertaking. Consider all the moving parts:

  • connectivity issues
  • interface translation (“localization” and “internationalization”)
  • cross-platform issues: desktop or mobile? both? one then the other?
  • what is the end goal? what do we want as outputs?
  • how many users at a time?
  • how do we define “collaboration” — “real-time” editing of the same document at the same time, for instance, is a hard problem (but not impossible, see Google docs)

That’s high-level stuff, what about the linguistics?

We need at least notions of:

  • words
  • morphemes
  • “sentences” (interlinearized “segments”… IGTs… whatever you want to call them!)
  • dictionaries
  • managing recordings of the above
  • orthographies
  • transliteration
  • metadata

If you imagine a pixie-dust scenario where you could a remote fieldwork application that could do whatever you wanted, what would it look like?

Perhaps upload a back-of-a-napkin sketch, or just throw in some more bullet points? Let’s just brainstorm.


Well while we’re going for wishlist - data has to include video, audio, images and text at a minimum.

I’ve been on the edge of a few projects where people have been able to push to a central repository in FLEx, with mixed results, so something that was very transparent about versioning and forks etc would be welcome.

Also, something that played nicely with archives (I don’t know how much pixie-dust that would take, but while we’re being completely blue sky on this, I want to put that out there)


Thanks for your thoughts!

:heavy_plus_sign: yep yep yep yep yep

I agree that these are pretty much a sine qua non. Also, we need to be handled content in these data types in a way that isn’t just “plain text”, but can handle structured representations of linguistic units, alignment, etc. Otherwise we may as well just be using existing tools like Zoom, Facebook chat, Dropbox, etc. (Not that those aren’t useful tools in many cases.)

I came up with the list below as a sort of “umbrella list” of … well, “aspects” of what we’d need to think about?

(Silly emoji symbols for fun, alternative suggestions welcome :wink:):

Media types (in increasing order of difficulty)

:page_facing_up: Text
:framed_picture: Images
:microphone: Audio
:tv: Video

Connectivity modes

:desert_island: Offline (actually I think this is worth thinking about, even just as a foil to the fully collaborative, networked, blue-sky scenario)
:satellite: Online
:handshake: Real-time collaboration


:twisted_rightwards_arrows: Versioning
:classical_building: Archiving

Linguistic data types

:card_file_box: Words
:scissors: Morphemes
:notebook: Texts (in the sense of a narrative, a conversation, whatever other genre)
:left_speech_bubble: “Sentences” (interlinearized “segments”… IGTs… whatever you want to call them!)
:open_book: Dictionaries
:symbols: Orthographies
:lips: Phonetic inventories
:file_cabinet: (Language) Metadata
:socks: Paradigms (paradigm… pair o’ socks… HAHA. Ehem.)

The last category, linguistic data types, is pretty much in line with what I’m trying to implement in the software library I’m writing about in my dissertation (it’s called docling.js).

Let’s think interface

These are all worthy areas to keep in mind. Each has its own technical expertise requirements: actually implementing these (connectivity, versioning, etc) in turn requires skill development, testing, etc.

But let’s just imagine that these things magically :tophat::sparkles: work. What does the actual user interface look like? What are the buttons? What are the boxes? Where do the video player and the record button and so forth live on the screen?

I think this kind of “participatory design” is a great way to start, actually.

It reminds me of an issue that came up in @ldg’s paper on “Developing without developers” (PDF): the exercise there was to reimplement ELAN on the web by cloning the interface quite precisely. The fidelity is quite amazing:

Here’s the quote about why cloning ELAN was chosen as the task:

Choosing an ex-isting app obviated the design process, saving timeand eliminating a potential confound. ELAN waschosen in particular because of its widespread usein many areas of linguistics, including LD, and because its user interface and underlying data structures are complicated. We reasoned that if our approach were to succeed in this relatively hard case,we could be fairly certain it could succeed in easier ones, as well.

ELAN is certainly a complicated interface, and the web port is an amazing achievement. But ELAN has evolved to do what it does over a very long process of development. The key fact to my mind is that process had a starting point — an initial idea — that only included some of the tasks that documentary linguists & colleagues are concerned with (namely, media/transcription alignment). Other things have been added incrementally, but I think that if ELAN were redesigned from scratch it would have not just different capabilities, it would have a very different interface.

That’s where I’d like to gently nudge this conversation… we have know what the ingredients are, but what does the cake look like, even if we don’t have a recipe?

(I just appended the title of this topic to include user interface to emphasize this… might be a record for longest title! :rofl: )


Thanks for the kind words! Yeah, I certainly didn’t choose to recreate ELAN since I thought it was the pinnacle of UI design for documentation apps. Like you say, I think participatory design is the best way to approach the design of a greenfield LD app, and I’d love to hear from others about how it’d look–I find myself more often getting lost in the weeds of how to technically handle some of the more difficult challenges you mention, such as support for rich data and offline-first operation.

1 Like