Using Transkribus for Tibetan OCR

An interesting article on using the Transkribus to OCR 11th-13th C. Tibetan texts… in cursive manuscripts!

I’m not sure if this scan is from the collection in question, but it might suffice as an indication of the kinds of texts they’re dealing with:

Y’all. Transkribus is nuts.

The “CER” or “Character Error Rate”s reported are… a little amazing:

Model name No. of pages checked CER% for Training Set CER% for Validation Set
Model A 40 1.39% 4.28%
Model B 80 1.35% 4.45%
Model C 120 1.18% 4.73%
Model D 160 1.15% 2.33%

There’s a nice video about the project here:

