An interesting live-tweet Twitter thread from the Right-to-Left conference

Neat:

Quinn Dombrowski is live-tweeting this conference:

https://dhsi.org/dhsi-2021-online-edition/dhsi-2021-online-edition-aligned-conferences-and-events/dhsi-2021-right-to-left/

The Right to Left (RTL) conference focuses on research and pedagogy related to the past, present and future of languages which are written from right to left, as well as their multilingual, multiscript and multidirectional cultural contexts. While these languages have posed technical challenges to computing, they have also become the object of increasing attention in global digital culture. RTL aims to encourage digital research in and about right-to-left language cultures, providing a frame for thinking beyond the left-to-right-centric assumptions of contemporary computing.

The RTL conference welcomes contributions from researchers, developers and independent scholars working on research and pedagogy of any living or historical RTL language, including, but not limited to, Arabic, Azeri, Hebrew, Kurdish, Ottoman, Persian, Syriac or Urdu. We are particularly interested in engagement and dialogue with societies in which those are spoken or read today.

I look forward to seeing some of these talks when they are put on online. I was interested in this tweet in particular:

There is a new(-ish?) thing in regular expressions for Javascript that seems relevant here (not sure how standardized Regular Expressions are across languages…):

This adds some more writing-system-aware capabilities to Javascript. You have to pass in a “u” flag (for Unicode), and then you use this syntax: \P{} .

What goes inside the brackets is where the magic happens.

So for instance on this beautemous page at the Universal Declaration of Human Rights, a lot of the “scripts” (writing systems) in Unicode are displayed:

https://unicode.org/udhr/assemblies/first_article_subset.txt

Arabic, Standard
يولد جميع الناس أحرارًا متساوين في الكرامة والحقوق. وقد وهبوا عقلاً وضميرًا وعليهم أن يعامل بعضهم بعضًا بروح الإخاء.

Armenian
Բոլոր մարդիկ ծնվում են ազատ ու հավասար իրենց արժանապատվությամբ ու իրավունքներով։ Նրանք ունեն բանականություն ու խիղճ և միմյանց պետք է եղբայրաբար վերաբերվեն։

Assyrian Neo-Aramaic
ܟܠ ܒܪܢܫܐ ܒܪܝܠܗ ܚܐܪܐ ܘܒܪܒܪ ܓܘ ܐܝܩܪܐ ܘܙܕܩܐ. ܘܦܝܫܝܠܗ ܝܗܒܐ ܗܘܢܐ ܘܐܢܝܬ. ܒܘܕ ܕܐܗܐ ܓܫܩܬܝ ܥܠ ܐܚܪܢܐ ܓܪܓ ܗܘܝܐ ܒܚܕ ܪܘܚܐ ܕܐܚܢܘܬܐ.

Bengali
সমস্ত মানুষ স্বাধীনভাবে সমান মর্যাদা এবং অধিকার নিয়ে জন্মগ্রহণ করে। তাঁদের বিবেক এবং বুদ্ধি আছে; সুতরাং সকলেরই একে অপরের প্রতি ভ্রাতৃত্বসুলভ মনোভাব নিয়ে আচরণ করা উচিত।

…and so forth…

You can write regular expressions to select text by script, like this:

let text = "<the stuff above>"

arabicScriptRegexp = new RegExp(/\p{Script=Arabic}+/, 'gu')

text.match(arabicScriptRegexp)

(Of course, this is going to match every language on that page that happens to be written in the Arabic writing system (Standard Arabic, Dari, Western Farsi, Malay (Arabic), Western Panjabi, Northern Pashto, Seraiki, Urdu, and Uyghur).)