The <ruby> tag

What’s a <ruby>, you say?

:warning: This tag doesn’t work in all browsers.

Check out this rather lovely-ly formatted page about Japanese dictionaries:

If you take a look, you will see stuff like this:

The topic du jour is this stuff:

Kokugo 国語こくご

As you can see, there are three kinds of writing going on there, and this is a feature of Japanese. The three are:

  1. Romaji: Kokugo
  2. Kanji: 国語
  3. “Furigana”: こくご

Notice how the Furigana are above the Kanji. There are special HTML tags specifically for doing this. The term “ruby” itself has a rather interesting and circuitous history, but let’s

…stay on target.

tenor-1

Here’s what the HTML for 国語こくご looks like (I’ve simplified it a bit from the page above):

<ruby>国語
  <rp>(</rp>
  <rt>こくご</rt>
  <rp>)</rp>
</ruby>

As you can see there are three tags here:

  1. <ruby> — this wraps everything.
  2. <rp> — this indicates whether the “ruby text” (the top bit) should have parenthesis when the browser doesn’t support ruby… It’s weird. Honestly, you can kind of ignore this one.
  3. <rt> — “ruby text”. This is the actual annotation content.

国語こくご

So a simplified <rp>-less version is quite simple:

<ruby>国語
  <rt>こくご</rt>
</ruby>

So yeah, it’s basically like glossing. In fact, you can even use CSS to put the ruby below the glossed text, just like we do in linguistics, see here. Except, browser support is super duper bad for that.

This all sure looks like it should be usable for interlinear glossing on the web. And I guess it is, in a way. But personally I don’t think it’s the right match for us. There are several problems:

:warning: It implies a ‘one-gloss’ model.

In linguistics we have glosses with a ton of annotations, not just one. What if you want to use more than one orthography, for instance?

dzümle-si-ni
𝔊𝔢𝔰𝔞𝔪𝔱𝔥𝔢𝔦𝔱=𝔦𝔥𝔯𝔢=𝔡𝔦𝔢
Gesamtheit=ihre=die

Finck, Franz Nikolaus. 1909. Die Haupttypen des Sprachbaus, . Leipzig: B.G. Teubner.

Which one gets the <rt>? You could imagine many more similar situations, with tone, say.

:warning: The annotated text doesn’t get its own node.

In the example above, what is presumably the “baseline” (dzümle-si-ni) doesn’t get its “own” tag at all — it’s just kind of floating there. Obviously you could wrap that in a <span lang=tr> or something, but at that point you’re creating markup to target via CSS anyway. So just… use your own markup.

:skull::warning: Browser support is lousy.

This is the real deal breaker; browser supports for <ruby> and friends isn’t great. It’s in the HTML standard, but that doesn’t mean much if it’s not suppored “in the wild”.

There are, by the way, other perfectly cromulent ways to format glosses in HTML, but that’s another topic.

Anyways. Some HTML blather for your consideration.

Also I just discovered that there is this thing. Good grief, someone else write that post.

1 Like

I had no idea ruby tags existed until I did some work on a DH project my advisor runs: they use ruby tags for showing POS tags in context for digitized Sahidic Coptic texts: http://data.copticscriptorium.org/texts/ap/apophthegmata-patrum-sahidic-029/analytic
Maybe not a bad way to handle simple cases of IGT, come to think of it.

I didn’t know browser support was so spotty, that’s too bad, especially since <ruby> is not an eminently silly tag like <marquee> or <blink> (and <blink> was legitimately a joke: “It was a lot like Las Vegas, except it was on my screen, with no way of turning it off.”).

1 Like

Hilariously, IE was ahead of the game on <ruby>.

https://caniuse.com/mdn-css_properties_display_ruby_values

Go figure.

This is a really cool project, and Coptic is such an awesome language. I will finally finish reading Ancient Egyptian (which, despite its title, is also about Coptic).

Anyways, how involved were you in this project? There’s a lot of interesting stuff going on. The “analytic” views in particular are neat. I was curious about the boxes — it seems there are three kinds of elements with entity_type attributes:

document.querySelectorAll('[entity_type]')
  .reduce((et, el) => et.add(el.getAttribute("entity_type")), new Set)

That returns:

[  "person",  "abstract",  "object"]

So I guess this is some sort of named entity recognition? The Wikipedia links are cool too.

It occurs to me that we should make an Awesome Interlinears list to compile examples of interesting web-based interlinears…