Liling Tan recently posted the results of a survey on NLP annotation platforms. NLP annotation is obviously somewhat different from documentary linguistics but I thought it’d be interesting to share here regardless.

I think in the context of language documentation, it’s interesting to think about what “annotating” would consist of in the general sense — what would the user experience look like? It’s interesting to note that both Praat and ELAN were included as annotation tools, but Flex wasn’t. I think that in the NLP world what we documentary linguists call “time-alignment” is thought of as “annotation”, but what we call “glossing” isn’t.

And yet, looking at some of these interfaces, there certainly is some commonality to glossing interfaces, and we could probably get some good ideas for interlinear text editing, for instance:

So this one for instance has constituency annotation with those arcs, that’s pretty cool. The “layers” business is also interesting. I think one thing we get sort of sucked into by default in documentation is the idea that words — or, I guess, more accurately — “word forms” can only have a single gloss. But the origins of the word “gloss” might be a better way to think about it. We could “gloss” part of speech on one layer and a Leipzig morphemic gloss on another. Thinking about how to gloss multiple words in an interlinear (as this tool does) is trickier, but shouldn’t we be able to “gloss” constituents at any level, in principle?

Gonna take a look at a few more of these interfaces, thanks for sharing @ldg!