First thoughts on Bloom

Continuing the discussion from :bulb: What are your projects and project ideas?:

_@jasnathanmartin mentioned Bloom and I thought it was worth moving it into its own discussion, here we go. —@pathall:

Bloom is a program and website where stories and other literacy-type materials can be shared with speakers of those languages. The website allows people to search by language and then read stories in that language. Some cool things that the computer program does is that stories can include audio, where each sentence is highlighted as the reader hears the audio, and there are also options to add image descriptions for visually impaired people and sign language for people who use sign language.

Bloom is very easy to use. It was designed to be used by people with limited computer experience. It doesn’t have a ton of features and at first I was frustrated with that. Then I had friends actually record audio without me “running” the computer and it was all worth it. I like things that can be used by everyone working on a project, not just the people with formal education and literacy.

From the Bloom program I can export books as pdf, epub, and I think a couple of other formats and I can also upload them to the Bloom website where people can read them online or download them. One of my favorite features, mentioned above, is the capacity for audio.

Here is a story that I transcribed and that my friend Paulino did the audio for:

1 Like

Here’s a screenshot of the site.

(Worth noting that this is an SIL project, so there is a lot of religious content, and SIL is a missionary organization whose stated goals include “transforming” the communities they work with.)

I made a church hymnal for the Ende community with Bloom and it was a major success for the literacy project! People loved being able to read the songs as they sang them. Much lower barrier for entry than story books :slight_smile: One benefit to the religious nature of Bloom was that I had great clipart to pair with the church songs. From a linguistic perspective, this is a lovely corpus as all the songs were written in Ende, not translated from another language.

Here’s a link to the PDF:

1 Like

Heck yeah! I checked them out. Obviously I don’t know the words or the songs but it looks like a really nice hymnal!

@pathall If I recall correctly, Bloom is a project of SIL LEAD, not to be confused with a different company SIL International.

I’m not sure I understand the religious texts warning. Maybe you can help me better understand what you mean with the “(Worth noting that this is an SIL project, so there is a lot of religious content.)” The comment comes across in the same vein as Jeff Good’s statements in Dobrin and Good 2009 where (I summarize) he criticizes academic linguists for relying on SIL’s software and in some way by the use of open source software supporting the goals of Bible Translation organizations. Though your statement has much more of the flair of a content warning found in a course syllabus (to which I ask why is it needed?):

  • Dobrin, Lise M., and Jeff Good. 2009. “Practical Language Development: Whose Mission?” Language 85 (3): 619–29. doi:10.1353/lan.0.0132.

There should be no fear among scholars to engage with religious texts. Religion and texts have a long history. If it weren’t for the temples and crypts in Egypt we wouldn’t have the vast collection hieroglyphic texts from both types of venues. These texts are deeply connected to religious beliefs and concepts of the after-life.

Probably the place where the most research has been done involving both the arabic script and linked data is around the quranic text. — a religious text. As examples of scholarship see also or also.

It is also religious belief systems which motivate many (but not all) of the community restrictions around language resource creation and use in the indigenous North American context.

Religious texts and their distribution have made great contributions to our understanding of Old High German, Old English, Old French, Old Church Slavonic. They have bearing on our understanding of the social structures of the Greek world and literary practice within that culture.

Christian religious texts are often used in Machine Translation hypothesis testing and development. There is actually great cause to seek out religious texts. If I am not mistaken many if not all of the materials on the Bloom Library come with some level of creative commons license meaning that their use in a variety of task (such as evaluating MT/AI tools on real world data) offers great potential. See the linked EMNLP2022 paper. My understanding then is that models generated by researchers using this data can then commercially deploy the AI models.

Today we present new "Bloom Library" datasets at #emnlp2022. These cover 363 languages with data for language modeling, image captioning, ASR, and visual storytelling.

Datasets 🤗:


— Daniel Whitenack (@dwhitena) December 10, 2022
1 Like

Good point, it’s not the fact that the texts are religious, it’s the fact that Bloom is run by a missionary organization that needs to be pointed out. Thanks for your suggestion, very helpful, I’ll update the comment.


@pathall maybe this should become its own thread…

  1. Both the nonprofit and academic sectors have a long history in transformative activism. I keep a running list on the discussion of activist scholarship. I was inspired by Paul Newman in his categorization of Language Documentation efforts as “is linguistic social work”. My initial review into the issue, as evidenced by the partial bibliography blow, leads to an analysis that the academic community is just as much into transformative results on the basis of their work as “Missionary Organizations”. It begs the question then, are academic institutions “missionary organizations” and if so, what is their religion? If not, then how do we classify the distinction between the types of nonprofit organizations as many universities in the united states are registered nonprofits?

  2. I have it from SIL International insiders that SIL strictly isn’t a missionary organization as it is not a registered religious organization, rather it is Wycliffe organizations, specifically WycliffeUSA in the United States, which are registered religious organizations (WycliffeUSA is a church to be specific). Then, through a series of highly structured interagency labour contracts Wycliffe employees do contract labor for SIL. Making the organization not a “missionary organization” but an “organization full of missionaries”.

This collection resources was put together as I sought greater context around Paul Newman’s statements and I was asking is it really “linguistic social work”? As a term are there links to other ideas about social activism in the academy? So in what other disciplines do the roles of “activist” and “researcher” coalesce? what other ideologies are there and what approaches are used?

Acar, Yasemin Gülsüm, and Canan Coşkan. 2020. “Academic Activism and Its Impact on Individual-Level Mobilization, Sources of Learning, and the Future of Academia in Turkey.” Journal of Community & Applied Social Psychology 30 (4): 388–404. doi:

Askins, Kye. 2009. “‘That’s Just What I Do’: Placing Emotion in Academic Activism.” Emotion, Space and Society, Activism and Emotional Sustainability, 2 (1): 4–13. doi:10.1016/j.emospa.2009.03.005.

Bayat, Asef. 2000. “Social Movements, Activism and Social Development in the Middle East.” 3. Civil Society and Social Movements Programme. Geneva, Switzerland: United Nations Research Institute for Social Development.$file/bayat.pdf.

Brossier, Marie. 2017. “Senegal’s Arabic Literates: From Transnational Education to National Linguistic and Political Activism.” Mediterranean Politics 22 (1): 155–75. doi:10.1080/13629395.2016.1230944.

Cancian, Francesca M. 1993. “Conflicts between Activist Research and Academic Success: Participatory Research and Alternative Strategies.” The American Sociologist 24 (1): 92–106. doi:10.1007/BF02691947.

Cox, Laurence. 2015. “Scholarship and Activism: A Social Movements Perspective.” Studies in Social Justice 9 (1): 34–53. doi:10.26522/ssj.v9i1.1153.

Drohan, Brian. 2017. Brutality in an Age of Human Rights: Activism and Counterinsurgency at the End of the British Empire. Cornell University Press. Brutality in an Age of Human Rights: Activism and Counterinsurgency at the End of the British Empire on JSTOR.

Flood, Michael, Brian Martin, and Tanja Dreher. 2013. “Combining Academia and Activism: Common Obstacles and Useful Tool.” Australian Universities’ Review 55 (1): 17–26. doi:10.3316/aeipt.196880.

Hales, Rob, Dianne Dredge, Freya Higgins-Desbiolles, and Tazim Jamal. 2018. “Academic Activism in Tourism Studies: Critical Narratives from Four Researchers.” Tourism Analysis 23 (2): 189–99. doi:10.3727/108354218X15210313504544.

Hawthorne-Steele, Isobel, Rosemary Moreland, and Eilish Rooney. 2015. “Transforming Communities through Academic Activism: An Emancipatory, Praxis-Led Approach.” Studies in Social Justice 9 (2): 197–214. doi:10.26522/ssj.v9i2.1152.

Newman, Paul. 1998. “We Has Seen the Enemy and It Is Us: The Endangered Languages Issue as a Hopeless Cause.” Studies in the Linguistic Sciences 28 (2): 11–20. We has seen the enemy and it is us: The endangered languages issue as a hopeless cause | IDEALS.

———. 2003. “The Endangered Languages Issue as a Hopeless Cause.” In Language Death and Language Maintenance: Theoretical, Practical and Descriptive Approaches, edited by Mark Janse and Sijmen Tol, 1–13. Current Issues in Linguistic Theory 240. Amsterdam, Netherlands: John Benjamins. doi:10.1075/cilt.240.03new.

Piven, Frances Fox. 2010. “Reflections on Scholarship and Activism.” Antipode 42 (4): 806–10. doi:10.1111/j.1467-8330.2010.00776.x.

Smith, Andrea. 2007. “Social-Justice Activism in the Academic Industrial Complex.” Journal of Feminist Studies in Religion 23 (2). Indiana University Press: 140–45. Project MUSE - Social-Justice Activism in the Academic Industrial Complex.

1 Like

Hi again Hugh,

To be clear I find this topic a bit tiresome. SIL is funded by Wycliffe, Wycliffe is a missionary organization whose stated aims include goals that aren’t what an academic institution would try to do. You can try to argue that that isn’t the case but I mean, both orgs’ financial statements are readily available. SIL is funded by Wycliffe.

So, I don’t think trying muddying the waters with an argument that that universities are somehow “religions” is a convincing (let alone helpful) argument, myself. :person_shrugging:

Anyway I personally hope that there are alternatives to SIL software in the future. It’s free and it’s pretty good and it will be around for a while, but I don’t think it’s a future proof set of tools, and in any case I have misgivings even about their online tools, because they come assume that centralization is the way forward (this is a much bigger issue than SIL, though) . On the other hand, for several tasks SIL stuff is the only game in town, and it’s true that they offer it for free.

But the whole linguistics/SIL relationship is weirdly defined and I find that it causes a lot of complicated, endless arguments exactly like the one we are currently having. And that’s time that could be better spent talking about new approaches, and documentation itself.

This is just my opinion, and keep in mind that I encouraged the Bloom post in the first place.


We all have our own motives as individuals and also as groups/communities. I think it is helpful when people are transparent about what those are not only with others but with themselves. Are any of us purely in the game to serve others? SIL linguists in Mexico would tell me about difficulties colleagues had encountered when a community would be visited by a linguist who would collect data and then leave. Community members felt that the linguist had advanced their career through this data collection, basically making themselves rich off other people’s language and culture. When SIL linguists, who planned to work with a community for many years, would approach the community, there was already distrust.


Pirahã certainly advanced Dan Everet’s career. So, I guess who we know and when we know them has an impact on our future. When I did LangDoc field work in Mexico I had the advantage of face-to-face introductions from ex-pats to local people. Those ex-pats had already lived in the area for 30+ years. They were vital to my success as much as working with the local school teachers I interviewed about their challenges teaching their minority language.

One thing I really enjoyed last fall at the Berlin Conference on LangDoc and Archiving was a presentation by some people who were originally funded by DoBeS. One of the take away points that I thought was interesting was that they said that after 20 years of scholarly research and activism related to language development they were just starting to see results related to literacy. The time frame is interesting because I have rarely seen scholars engaged with a single community for that long without a missionary connection. But still more interesting is that many missionaries (at least the ones I have interviewed) take that long to see similar results in their language development related activism.

Sabine and Eliane also mention the fact that they had to look outside of the funding streams traditional to scholarly linguistics to make their activities possible; something nonprofit staff also must do. Maybe there is something in their approach that we can apply to software development. A long term challenge in academic software development is its sustainability—especially on mobile platforms where vendor APIs seem to be regularly changing.

Perhaps closer look at the bloom library might reveal if many of the authors are first-language speakers of the content they are producing. It might be that some of the material is created via Machine Translation or second-language users. I like the fact that Bloom appears to be having an impact at least on writers/authors of content if not also readers. I have often wondered why the content of Bloom is not indexed in aggregators like OLAC—It seems that the metadata for OLAC is general enough that it would be present in the authorship process. One of my presentations at Berlin pointed at the need for indexing introductory literacy materials and educational resources for languages with small user populations. In one sense Bloom content is ephemeral, it makes me wonder what the relationship between Bloom content and archives ought to be…

@pathall I agree that a decentralized data sharing approach is useful. I know that FLEx does allow this with the FLEx database essentially becoming a mercurial object.

But centalization is the current business model for Living Tongues Dictionaries, and Language Concerancy dictionaries Learning Resources - The Language Conservancy In a way the business model “strengthens” the dependency of the clients on the technology provider as a service provider.

With regards to FLEx whole FLEx tool set needs to be re-done. In general I think the information architecture is fairly good, but the UI and the core codebase are in desperate need of a total re-do. Not only are they showing their age in their UI, but they are also stuck on a single platform. FLEx is the work of 5 programers over close to 20 years… Funding that sustainability through academic channels is a challenge. A modular decentralized approach to development such as what automattic does with Wordpress for core development seems that it might allow various teams to work at the same time but on different feature sets. Maybe there might be possibilities for dev time out of digital humanities centers on universities, but even these units are hard pressed to retain staff because once one has the needed skills the money potential is elsewhere. I find it take as strong, experienced, and wise project manager to create sustainable software. Inexperienced clients don’t know how to navigate technology sustainability.