Some work on IPA and input: features and transcription (with demos!)

What good is the IPA?

The IPA is more than an alphabet — it’s an alphabet plus a phonetic feature system. That is, when you are “in” IPA and you use the character «d», at least by default, you “mean” at least these things:

feature value
place alveolar
manner plosive (or stop)
voicing voiced

You could say something similar about a vowel character, say, «o»:

feature value
height close-mid
rounding rounded
backness back

You can think of diacritics as fitting into this system too. ◌̪ “means” just one feature, dental place:

feature value
place dental

Hence if you “add” the features of «d» and «◌̪», you get «d̪». (Looks like that character isn’t rendering very well in this site’s font! Time to switch fonts…) The diacritic “overrides” the default place value, and you get a voiced dental plosive:

feature value
place dental
manner plosive (or stop)
voicing voiced

It gets a lot messier than these examples, but the point is, as a documentary linguist, you just know this kind of thing. You took phonetics or otherwise acquired the relevant knowledge. This is bread and butter to us. We need this knowledge because we use it to do a bunch of other stuff: find minimal pairs, establish oppositions, describe allophony, etc etc.

But the thing is, this information is basically not in our software.

When you stop and think about it, it’s kind of weird.

What representing the phonetic feature system would buy us

Sooo many things that are currently a pain in the neck could be automated or semi-automated if we encoded this association between phonetic features and Unicode characters. For instance:

  • Phonetic charts The IPA charts are arranged in terms of phonetic features. If the association is in place, you can render charts (containing just columns and rows necessary for a particular language) automatically.
  • Graphemes The aspiration diacritic (well, “modifier letter”in Unicode lingo) doesn’t typically stand on its own, it’s used together with some other “base”, «pʰ» or «qʰ» or whatever. Unicode knows that, and that means that you can automatically split transcriptions up into “graphemes” that are pretty close to “phones”.
  • Minimal pairs Once you have those grapheme/phone correspondences, things really open up: it’s possible to automatically generate lists of minimal and near-minimal pairs, for instance.

I have been playing around with this stuff and have some half-baked demos you might want to poke around in.

Find characters via phonetic features

Helps to build up a little palette of characters interactively that you can use to stick characters into a transcription. The important part is that you can find characters via phonetic feature values: if you search for uvular you find «qʁχɢɴʀ», if you search for uvular voiceless you find «qχ», etc. Clicking the character you need gives you a button below the transcription input. (Yes, buttons for this are clunky and other UIs are possible and desirable.)

Automatically “graphemize”

This writing something with some diacritics and paste it into the box below. You’ll see how the characters are broken up into “graphemes” automatically. Try pasting in béle ɲɔ́ɔ pʼəʕáp qʷásqʷiˀj mir̠ar̠a

Automatically generate a consonant chart

This is still a work in progress, but I’m kind of stoked about it. Note: this one definitely won’t fit on a mobile screen, so try it on your laptop. In fact, it doesn’t fit terribly well into this site, you might want to follow the link and check it out there:

What this does is parse up your transcription, try to identify (known) IPA “graphemes”, and then plots them in a chart with the necessary rows and columns. It’s very imperfect, but I think the idea is pretty compelling, and for some languages it’s probably already useful. Will do vowels and tones soon.

Really interested to know what you all think about this stuff. Does it give you any ideas? Do you think it’s useful?


Really cool stuff Pat–that automatic chart thing especially is way cool!! These are all great pearls that could support rich UX for web apps for linguists. Imagine using that chart thing in a documentation-oriented app that can let people quickly audit their consonant inventory, for instance, and perhaps catch mistranscriptions.