A minimal example of interlinears with HTML and CSS

Thanks, @nikopartanen, for prompting me to write this!

One of the cool things about using the web for language documentation is that you can format interlinear text right.

What does “right” mean? It means:

:point_right:t3: Words wrap correctly
:point_right:t3: Every word has its own tag
:point_right:t3: Every “tier” of every word has its own tag
:point_right:t3: Sentence-level “tiers” like the transcription and one or more translations also have their own tag

In this post I’m going to share with you what I have come to believe is the simplest way to meet these requirements with HTML and CSS. We’re going to look at two sentences, one very short, and one very long. The short sentence has just three words, the long one many.

So here is a simple short sentence (this is from a text in Hiligaynon), presented in a tabular format:

transcription pahigád kamo da!
translation Get out of my way!
words
form gloss
pa-higád CAUS-move
kamo 2PL.ABS
da there
(raw data)
{
  "transcription": "pahigád kamo da!",
  "translation": "Get out of my way! stay on the side",
  "words": [
    {
      "form": "pa-higád",
      "gloss": "CAUS-move"
    },
    {
      "form": "kamo",
      "gloss": "2PL.ABS"
    },
    {
      "form": "da",
      "gloss": "there"
    }
  ]
}

I won’t melt your eyes with the same presentation for the long sentence, but if you like click here to see it.

long sentence
transcription indí gustó si Juan nga mag-ininawáy kamó da’ magcomment kamó nga amo ní eskwelahán nyo amo ní amo ná
translation Juan doesn’t like for you to fight in the comments about this school or that school
words
form gloss
indí NEG
gustó like
si PERS
Juan Juan
nga LINK
mag-inin-awáy NOM-fight-RECIP
kamó 2PL.ABS
da’ there
mag-comment NOM-comment
kamó 2PL.ABS
nga LINK
amo same
this
eskwelahán school
nyo 2PL.ERG2
amo same
this
amo same
already

Obviously these tabular representations are not ones we want, we want standard interlinear notation. We can get there with HTML that looks like this:

<div class=sentence>
  <p class=transcription>pahigád kamo da</p>
  <div class=words>
    <p class="word">
      <span class="form">pa-higád</span>
      <span class="gloss">CAUS-move</span>
    </p>
    <p class="word">
      <span class="form">kamo</span>
      <span class="gloss">2PL.ABS</span>
    </p>
    <p class="word">
      <span class="form">da</span>
      <span class="gloss">there</span>
    </p>
  </div>
  <p class=translation>Get out of my way!</p>
</div>

If we just stick that into a page without any CSS on it, we get this rather weird-looking thing:

pahigád kamo da

pa-higád CAUS-move

kamo 2PL.ABS

da there

Get out of my way!

It’s not awful, but it’s not standard notation.

The little demo below demonstrates how the presentation can be fixed. Fixing the wrapping behavior comes down to applying a single CSS rule:

.word {
  display:inline-grid;
  margin-right: 1em;
}

So, what the heck does that mean? I would suggest that for now, if you don’t have experience with CSS, you simply ignore the question of what the rule “means”. Rather, focus on a different question: what the heck does that do? This rule is enough to get sensible interlinear formatting that matches the way we are used to thinking about them.

To help you get a feel for what the CSS is doing, I made this little demo for you to play with:

Here are some things to try. If you toggle option #1, you will see borders added to every tag in the markup. Notice:

  • Every “tier” for each word — the form and the gloss — has its own tag (a <span>, in this case).
  • There are boxes around each word — this is important! We need this tag to serve as the target for the rule setting the display and margin-right properties, as shown above.
  • There is a box around the whole list of words. This corresponds to the <div class=words></div> tag in the markup.

Which particular tags we’re using is not as important as the nesting pattern. In other words, it’s not as important that we chose to wrap the “form” bits of the interlinear in <span> tags (as opposed to <p> tags or something), as the fact that each form (and each gloss) is inside something that corresponds to a word. Here’s the hierarchy:

  • sentence
    • transcription
    • words
      • word
        • form
        • gloss
      • word
        • form
        • gloss
      • word
        • form
        • gloss
    • translation

It’s easier to see the benefits of this pattern if we consider a (much) longer sentence, like the one below,

Notice that if you resize your browser (or if you’re already reading this in a phone), the words wrap correctly. This is not a trivial feature: it’s very important, especially when we consider that many of the communities documentary linguists work with have limited access to large-screen devices, but stable access to handheld devices.

There’s more to be said about this, but in general, if we stick to this tag hierarchy pattern, we can do lots of cool stuff to modify it with CSS.

5 Likes