A Gallery of Parallel Text Formats

Ever since reading @ejk’s post on digitizing Tunen texts I have been thinking about the format notion of the “parallel text format”. I thought I would start a “gallery” thread, to try to build up a collection of examples that might be useful as inspiration for creating digital parallel text formats.

First up is probably the minimal case:

Layout Content
Verso page Lingala prose transcription
Recto page English prose translation

Woods, David R. & Fulbert Akouala. 2002. Lingala parallel texts . Dunwoody Press.

Worth noting that recreating this layout from a digitization would require annotating which sentences begin a new paragraph, and which are headings (LINDONGE / The Termite Heap).

1 Like

A page from Kashaya Texts by Oswalt:

This is essentially the same layout as the Tunen text above, with the distinction that paragraphs are numbered inline.

Layout Content
Verso page Kashaya prose transcription
Recto page English prose translation

It takes an awful lot of work to align these parallel texts, and you’re entirely on your own for morphology. Interestingly, Oswalt produced also produced an unpublished digital version of the book this screenshot is from (using an idiosyncratic orthography), which was aligned:

5. The Deer and the Bear
(Told by Herman James, August, 1958)

1. ma?al ?ama: dic'i':du duweni' bak^he ?aca? yacol dihqaw^.
This story from the old days was given to the Indians.

muli'do mi:li bu7aqa' q'o bih$e q'o' nohp^how- kulu: ?ama': tol^.
Bears and deer were living in the wilderness.

kulu: ?ama': tol- q^ho: no'hp^ho nohp^how^.
Two families lived in the wilderness.

menin hi?baya' c^hot' i'do q^ho: ?iy^.
Neither had men.

k'awada- ?ul- miya':daq^ha?yacol ?ul- cuhma qamu':muc'ba duhk^huy?^.
They were widowed; the husbands had been killed fighting the enemy.

2. mens'i':lido mul- kuma'ci- bah$a c^he'?e: ti bahnati':c'edu ba7aqa' ?em 
bih$e ?el^.
One day the bear asked the deer if she would go leach buckeye nuts with her.

mens'i:li bih$e ?em "hu':?^" nihcedu q^hama:ti' h$iyi?^.

kulu: mul bah$a c^he'?ep^hila q^hama:ti' h$iyi?^.

"hit'e:ti'm ?amadu'we^.

?amhu'l hit'e:ti'm" nihcedu bih$e ?el^.

"hu':?^" nihcedu mens'i:li bih$e ?i'ma:ta ?emu^.

3.  mens'iba ?ul mul ?amadu'we tubi'hciba ?ama: 7'i:- do?q'o'?diwac'ba- ?ul 
tiya':co?k^he c^he?e?k^he bak^he- buhq^ha'l li bawil?ba ?ul^ da:bi'c^hqa:^ 
bi?da ba'h7^hel tolhq^ha?^.

mens'iba ?ul bahci'l idom ma:ca? nohp^ho': to:- p^hila? mi'lhq^ha? ma:ca?^.

mens'iba mi': ?ul- ?ahq^ha ?i':li ?ul p^hima':c'i?- bu7aqa' ?em- la': he: 
bih$e ?e'mu hlaw^.

Okay, incompletely aligned. Anyway, I just point this out because the interlinear format in the Tunen text has some commonalities with the Kashaya one, and in fact it is probably that a tool for expediting the transcription of the Tunen text could have applications for other languages.

2 posts were split to a new topic: The Classical Text Editor

2 posts were split to a new topic: Setting parallel texts with LaTeX

How a Takelma House was Built

Edward Sapir, The Takelma Language of Southwestern Oregon, 1912

The whole volume is here:

might be easier to read if you use the Internet archive viewer.

I think this text is interesting for several reasons:

  • It’s from Sapir’s first book.
  • It’s a good example of the “phrasal” style of glossing used in the early 20th century.
  • The footnotes are… well, a little bonkers.
Layout Content
Interlinear pages (first section) Takelma prose transcription
Interlinear page upper section Two-line interlinear
Interlinear page lower section Footnotes with grammatical categories, cross-references, etc
Translation section English prose translation

Texts like this must have been real nightmares for typesetters back in the day, because you had to find the precise balance between the interlinearization and the length of the corresponding footnotes. Also, the footnotes are highly repetitive

This particular text is in an old style, sometimes referred to as “word-by-word” glossing. This style is actually quasi-readable, and you can still get at least a vague feeling for how the language works just by reading through it. If we extract all the glosses from this example it reads like a kind of quasi-English:

People house they make it. Post they set it down, and here again they
set it down, yonder again they set it down, in four places they set it
down, in four places they set them down. Then also they place (beams)
across on top thereof in four places, and on top thereof just once
they place (beam) across. Then and just house its wall they make it;
then and on top thereof they put them house boards. Sugar-pine those
boards they make them.

This syntax is largely grammatical as English. But even from this literal and rather weird translation,
you can get some feel for “where things go” in Talkema. For instance, there is clearly a similarity in structure between People house they make it and Sugar-pine those boards they make them. We can start to build up a little intuition about how clauses hang together.


Yikes, look at all these footnotes!

All the grammatical categories are spelled out in full, and they’re quite “far” from the content they’re describing in the text itself. (I can’t help but wonder if the difficulty of using footnotes like this might have led to modern glossing practices like Leipzig glossing, where abbreviations are opaque but adjacent to their referents.) Just a little excerpt here.

  1. Third personal subject, third personal object aorist of verb k!emēᵋn Type 3 I MAKE IT; §§ 63; 65.
  2. p!a-i- DOWN § 37, 13; dīⁱ- § 36, 10. lōʹᵘkᵉ third personal subject, third personal object aorist of verb lōʹᵘgwᵋn Type 6 I SET IT; §§ 63; 40, 6.
  3. han- ACROSS § 37, 1. -gili`p’ third personal subject, third personal object aorist of verb -gilibaᵉn
  4. Third personal subject, third personal object aorist of verb mats!aga’ʹᵋn Type 3 1 put it; §§ 63; 40, 3.
  5. da- § 36, 2 end; -t!aba`kᵉ third personal subject, third personal object aorist of verb -t!abagaʹᵋn Type 3

By way of comparison, this very text also appears in Sapir’s Takelma Texts, in a much simpler format akin to the previous example:

No word-level analysis here whatsoever, as is the case with the rest of that volume.

One more while I’m at it:

Here’s a brief story from another old volume:

Migeod, Frederick William Hugh. 1908. The Mende language, containing useful phrases, elementary grammar, short vocabularies, reading materials . London K. Paul. The Mende language, containing useful phrases, elementary grammar, short vocabularies, reading materials : Migeod, Frederick William Hugh, 1872- : Free Download, Borrow, and Streaming : Internet Archive (8 July, 2022).

Layout Content
Metadata (first section) Title
Mende/English, one line per paragraph Parallel sentences

In this rather unusual layout for a narrative, each sentence is laid out as if it were its own paragraph, or rows in a table. Wrapped content is given a hanging indent.

I’ve seen that style of glossing in old editions of Latin texts for students! I’m pretty sure I used to have an edition of Virgil’s Aeneid in that format (probably this one). Also cf. the texts that seminary students call “ponies”.

1 Like

Huh, yeah. I wonder if linguistics picked up the format from classicists? (Presuming you’re talking about the Takelma format?)

1 Like

It seems very likely.

1 Like

There are even earlier instances of glossing than this, of course, for instance the famous Old English glosses of Latin texts:

For more, see this interesting short article:

The “numbering” here is particularly interesting, and it’s not limited to old examples; sometimes modern(-ish) linguists number glosses like this too:

Cooke, J. R. 1968. Pronominal Reference in Thai, Burmese, and Vietnamese . University of California Press.

1 Like

Oh, and there’s the interesting (and complicated!) practice of Kanbun, by which Japanese grammar was “written in” to glosses of Chinese.

1 Like

I had forgotten about this important paper, from 2003:

Towards a general model of interlinear text — Charles Darwin University | (pdf)

Abstract: The use of interlinear text has long been a valuable tool in linguistic description, and the development of a number of different software tools has facilitated the creation and processing of such texts. In this paper we survey of a range of interlinear texts, focusing on issues such as grouping and alignment. Abstracting away from the presentation, we look specifically at the structure of the data, in an attempt to create a general purpose data model for interlinear text. Our findings are that a four level model — incorporating Text, Phrase, Word and Morpheme levels — is sufficient to represent a very wide range of practice. We present an XML format for representing data in this model, and describe stylesheets for converting such data into presentational formats. Because of its generality, and the way it abstracts away from presentation, we believe the model is a suitable basis for developing archival storage formats for interlinear text and delivering interlinear text to end-users and external software tools in a web environment.

This paper does something very similar to the current thread, with the added bonus that original scans and XML versions of the texts are included.

Or should I say, were included, because it seems that the links have expired.

Posted a bit about this over on Twitter

And here’s a backup of the data on GitHub:

1 Like

I’ve seen the “numbering” style in (a) a Latin translation of the works of Confucius, and (b) a learner’s edition of Phaedrus’s “Fables”, where the numbers serve to “untangle” the convoluted poetic syntax. It’s a shame that this is no longer used—I’ve had success in rendering Sanskrit texts more comprehensible by adding numbers (by hand) to both the original text and the English translation.

1 Like

Yes, Kanbun is great! Apparently something very similar was done in Korea as well.

1 Like

I’d be interested in learning more about all of these examples!

I’ve added hyperlinks to my earlier post :slight_smile:. I’d need to locate the Sanskrit book I hand-wrote numbers in, though.

1 Like

Speaking of parallel texts, and untangling complex word orders, I can’t believe I haven’t shared this dependency-marked edition of the beginning of Ovid’s Metamorphoses that I typeset in LaTeX:
Metamorphoses.pdf (88.8 KB).

I’m not sure how helpful it is for the reader, but the process of coding all the dependency arrows definitely helped me as a learner! (Also, the paraphrase in the left-hand column is lifted straight from Crispinus’s Delphin edition of 1765.)