So, I probably should have better answers to this myself, but I don’t:
Is there software besides FLEx that provides a user interface for doing morphological parsing?
I don’t mean something like a script that runs in a terminal, I mean something with a user interface: like, you type a transcription line, and you click things and type things and what it, and it tries to update what you input with a morphological analysis.
Obviously FLEx does this, but is it actually the case that the field has one user interface for a task so fundamental?
There’s toolbox (a program that hasn’t been minorly updated since 2016 and majorly updated since 2012).
LingSync sort of did this, but disappeared, I believe, and its successor hasn’t been released
John Goldsmith had a wordlist parser whose name I can’t remember but it’s surely no longer working, given I can’t find it.
Elan has access to the Weblicht system that lets you plug in certain parsers, and it has a rudimentary built in system that’s based on … you guessed it, LIFT format (ie from FLEx)
Ah yes of course, thank you @cbowern. I knew as soon as I posted this that there would be things that escaped my !
@pathall Do you mean automated morphological parsing only? Or manual? Or either?
If you mean automated, then FLEx and ELAN are the only ones I can think of. These are both rule-based, i.e. not probabilistic/machine learning. I believe @ldg 's Glam aims to incorporate probabilisticmorphological parsing. Use of Toolbox should be deprecated IMO.
There are of course plenty of machine learning models out there for morphological parsing. And a few one-off interfaces developed for specific projects that may or may not be maintained.
A user interface that incorporates probablistic automated interlinearization, including morphological parsing, is a gaping hole for language documentation!!
@SarahRMoeller could you link to some of the machine learning models? In particular, I’ve been wondering about ways to use lexical data to create a part of speech tagger for lexical data
I generally search aclanthology.org/ to find new models. Sometimes there is code to download and occasionally I am smart enough to make it work for my goals. There is also paperswithcode. I usually use the fairseq implementation of the Transformer architecture for sequence2sequence tasks like POS tagging with Wu et al.'s optimization parameters for low-resource settings.
I don’t know any set ups for out-the-box training of a POS tagger for a new language. Even with a simple interface. That doesn’t mean it isn’t out there. I’d like to hear if there is.
Most machine learning models are simple to build and train with labeled/annotated data. The issues are
- if trained only on the typical amounts of post-field IGT data then they may not be accurate enough to be useful
- there is no easy way to get them back into a useful format (e.g. FLEx). A good user interface that allows incorporating your own models into the backend would solve this.
A POS tagger is relatively easy to build and does not require a ton of data. Running text is best because the best models learn from context. Do you have POS-labeled sentences?
How about @ldg 's glam app?
That’s right though for now it’s only in the design, and not in the implementation, and ideas are cheap of course. That said, if anyone’s curious about the idea, see page 6 in my paper, the section that begins with “NLP Integration”.
Thanks! Unfortunately I don’t have any labeled running data, but I have a lexicon with POS tags and some fairly clear phonological~Part of speech associations for inflected forms, because of the amount of morphology (anything that ends in -nim is a noun, anything that starts with ing- is a verb, etc). Word order is very free so I suspect that context might not be as helpful
Hey wow! John Goldsmith’s parser still exists: Linguistica, [here] - python code, or here (earlier version)
Oh that’s right! Linguistica was updated a few years ago.
As for the POS tagger, a lot of the SIGMORPHON shared tasks (the “re/inflection”) essentially include training a POS tagger on a list of random inflected forms. So training a POS tagger on your list of words seems quite feasible.
It might be worth comparing the Transformer against some non-neural models such as a SVM. When data is limited, non-neural models often do better than state of the art models. There are a few data augmentation tricks that the shared tasks proved and that might be worth trying if initial accuracy is low.
I’m trying to be concise, so let me know if there’s anything I should expand on.
Also, glad to help if I can!