GPT-3 does interlinear glossing and phonetic transcription!

Hi all,

I’ve been playing around with GPT-3, an AI language model that’s been in the news for the past few years. Basically, it’s like the autocomplete on your phone, but trained with such a huge amount of data, and so many abstract layers of representation, that it has an uncanny ability to imitate a wide range of written styles and genres.

One of those genres, it turns out, is interlinear glossing. Observe:

NB: In the following, boldface indicates the prompt that I provided; the remainder is automagically completed by the AI.

Pretty neat, huh? (In case you’re wondering, the generated gloss and translation are exactly correct.) The best part is, you can try GPT-3 on your own, for free, by signing up here. (There used to be a waiting list, but now everyone gets $18 of credit to be used within the first three months after signing up.) Try other languages, if you like! Also experiment with other orthographies and even writing systems (the AI has full Unicode support).

Not sure how practical this really is, but I thought some of you might get a kick out of it! (And maybe some ideas for automated glossing tools?)


It also does phonetic transcription (of English, at least), though which dialect it’s transcribing is anyone’s guess!


OK, I tried eliciting phonetic transcriptions for the standard lexical sets that are used in English dialectology. Here are the results:

AI-generated phonetic transcriptions of English lexical sets

Most of the pronunciations seem to correspond to British English (RP). However, not all of them do:

  • dress is [ˈdres], not [ˈdrɛs]
  • bath is [ˈbæθ], not [ˈbɑθ] (so maybe it’s Australian?)
  • nurse is [ˈnɜrs] (Irish?!)
  • palm is [ˈpæm] (I have no friggin’ idea where this is from)
  • thought is [ˈθɔrt] (likewise)
  • price is [ˈprɪs] (hello, Middle English!)
  • start is [ˈstɑrt] (Irish again)
  • north is [ˈnɔrt] (some foreign accent, presumably)
  • force is [ˈfɔrs] (Irish)

This post has illustrated a new direction in documentary linguistics: doing fieldwork on large language models! (Kidding, obviously. Though now that I think of it, I wonder how much effort it would take to create an AI that could be used to run a plausible simulation of an elicitation session.)

1 Like

Thanks or sharing this interesting stuff, @skalyan. I confess that I find things like GPT-3 hard to wrap my head around.

AI of this kind seems like such a totally new category of technology to me that it’s difficult to even begin to imagine how it will fit into how we work in the future. It’s certainly fun to look at the outcomes here — the output probably compares favorably with a lot of undergrad linguistics phonetics homeworks — but where it goes haywire, it’s just… weird ([ˈpæm], [ˈnɔrt], and so on).

Presumably there will be fewer and fewer such errors as the models grow. But this is the weirdest part to me: is the improvement “science” in any traditional sense? It has a certain lottery feel, I think: “pay here, for what may or may not be a useful result, and we can’t really tell you why it’s answering like it is…”

I find this scenario pretty unsettling. I mean, if we can’t do any kind of analysis on how the conclusions are being made, it may as well be magic. And the model is private property, accessible via paid appointments.

Agreed on all points! I find GPT-3 amusing (which is why I share its outputs with people), but I wouldn’t really trust it for any serious task.

Another issue (which I’ve heard Avery Andrews point out) is that these models require several orders of magnitude more data than the input that a child receives—and still do worse than a child in many respects. So whatever these models are doing must be nothing like language acquisition in humans. This alone is enough to rob these models of most of their theoretical interest for me.

1 Like

Just for fun, @skalyan want to try this with a language it probably hasn’t seen any data on?

Here’s a few interlinear examples from Barayin [bva] with enough repetition that it should be able to figure out a few words (at least in the 2nd line):

Text: ki gor-e-ti na waajib ki ŋ asib-o-geti asib-o
Gloss: SBJ.2SG.M buy-PRF-OBJ.3SG.F BG truly SBJ.2SG.M PREP sheperd-INF-POSS.3SG.F sheperd-INF
Translation: ‎‎If you buy one, you really need to tend to them.

Text: mooso ki gor-e-ti na ki ŋ asib-o-geti asib-o
Gloss: cow SBJ.2SG.M buy-PRF-OBJ.3SG.F BG SBJ.2SG.M PREP sheperd-INF-POSS.3SG.F sheperd-INF
‎‎Translation: ‎‎If you buy a cow, you will should tend to it.

Text: wala ki jekk-a-ti siidi do
Gloss: without SBJ.2SG.M leave-IPFV-OBJ.3SG.F home NEG
‎‎Translation: ‎‎You should not leave it at home.


Here’s what it came up with. (I reordered the examples so that the second one you provided comes last, as that’s the one you wanted to see if it can replicate.)

Obviously it couldn’t do anything with mooso, since that word didn’t appear anywhere in its input—but it correctly glossed most of the words that are shared between the second and third sentences! The translation, of course, is useless.



The gloss “1PL.POSS.2SG.M” is interesting – it’s nonsense, but it’s interesting that it decided to switch to first person plural for some reason and managed to get the right abbreviation to go with the “we” in the “translation” (buy why “POSS” ? ).

It didn’t seem to figure out the alignment either. After it skipped “mooso” completely it inserted a random gloss “truly” later on. But with more data it could plausibly become be a better version of the FLEx parser


I think it inserted “truly” because it was copying from the gloss in the previous example.

It’s important to note that these language models are not deterministic, and so running them a second time is likely to yield a different result. I just tried running it several more times—and remarkably, every time, it inserts “truly”! So it clearly hasn’t “understood” the nature of the task (which isn’t surprising). Oh well.


I guess it won’t start applying for grants soon :sob: