Notes from the Post-ComputEL-5 Discussion

Continuing the discussion from Post ComputEL-5 Discussion:

Hi @SarahRMoeller! Sorry we missed you.

Sorry for the confusion; I should really put some work into setting up banners or so some such on this site with upcoming stuff; there is an events plugin which is supposed to pick up timestamped events in posts and add them to a calendar buuuut… :beetle: bugs.

So I was trying to take notes in the background in a post as we talked but it was hard to keep it up as we had a pretty wide-ranging talk. Rather than kvetch I’ll just dump it in another topic here as-is, with the understanding that any of @cbowern , @ldg , or @fauxneticien might want to edit it. I can say right now that some of these notes will be hard to interpret, but at least you can get a vague feeling for what we were talking about. I did make this a wiki post so you can edit this text directly if you like.

There did seem to be interest in a recurring meeting of this sort, or perhaps even an LSA panel.

Why are people willing to go through complicated workflows in order to use FLex? (@cbowern)


  • coding is scary
  • a GUI is more comfortable


  • SIL’s influence on linguistic practices — Toolbox, FLEx, bible material content, even what is available for NLP and machine learning are done on what comes out of SIL work.
  • Is this a historical development simply because missionaries were interested in linguistics for the purposes of conversion, or were linguists


  • Standardization is useful, and what comes out of FLEx is semi-standard
  • Toolbox was the only tool that was popular for so long
  • The environment


  • If you’re an SIL affiliate you kind of have to use FLEx stuff


  • the right way to handle this is to decouple morphology from the core system
  • not possible to create a default morphological parser


  • the FLEX parser is limited
  • vowel harmony, underspecification, doesn’t work well with tone languages, allotony


  • re decoupling - ELAN, for fieldwork data is great
  • but legacy data is another story: different data types
  • what do we do with all the old stuff?
  • toolbox was nice with Toolbox - you could do transcription in ELAN, and then parse in Toolbox


  • digital humanities tools are great if you want to
  • if you want to incorporate parsers and so forth
  • Ticha - workflows?
  • Bardi - normalization of legacy materials was a big part of the project. Originals were all texts

The One App to Rule them All (@fauxneticien)


  • the One App to Rule them All idea: we should think in terms of dataflows as opposed to app design - input output problems
  • even if different apps have the same “functional” content, if they can’t do I/O, there are roadblocks


a constituency tree: 1 text, 1 token, a span layer, relation layer, interdependencies between layers

user building blocks to build any kind of representation, and also multiple representations (say, consituency and named entities)

  1. text -
  2. tokens (maybe many)
  3. spans
  4. relations - source, target, value

could be common configurations


  • in exporting a structure, you don’t have a meaningful representation