What might a fieldwork interface for botanical research look like?

Continuing the discussion from The Rift Valley Network: learning how to collaborate:

Thanks for sharing this interesting talk @Andrew_Harvey. As usual, all I ever think about is user interfaces, but I believe there is a fairly obvious and interesting interface design question that arises in the context of this talk: a botanical survey interface.

Professor Lusekelo provides a description of the kinds of data he is working with (or will be working with) as he collects information with Hadza people:

  • Hadza name
  • Kiswahili name (optional?)
  • English name (optional?)
  • Isanzu source?
  • Datooga source?
  • Botanical (Linnaean) name (optional?)
  • “Utilities” - uses of the plant
  • Image(s) of the plant
  • linguistic status (borrowed or native word)
  • place in a folk taxonomy

Perhaps I missed some but those are the ones I heard.

Anyway, I’ll refrain from more thinking out loud, and instead ask you to use your imagination a bit.

Suppose you were involved a project to design some sort of tool for collecting botanical data of this kind.

  • What kinda of questions would you ask of the participants?
  • What functionalities would you prioritize?
  • What sorts of devices would you expect to be most relevant for your interface? (remember, it’s perfectly reasonable to target one kind of device for collection and reserve the design of a “publication”-oriented interface for a separate project addressing different devices: maybe you collect with a tablet interface and target mobile devices for publication)

Just had a conversation with a colleague the other week about how nice it would be to have some kind of wiki/database on flora and fauna terms in Chadic languages that anyone could collaborate on. Zero idea how that would work though.


This is a great question, and talking through things could eventually end up in a useful blueprint for a data collection / documentation tool!
David Fleck wrote a really useful introductory article for linguists who might want to obtain scientific names for plant and animal vocabulary they collect in the field. (ResearchGate link here: https://www.researchgate.net/publication/249970933_Field_linguistics_meets_biology_How_to_obtain_scientific_designations_for_plant_and_animal_names). Obviously this is a slightly different goal, but some of the things mentioned in this paper are useful for the larger enterprise.
When I do plant/animal name stuff with the communities I work with, the kinds of stuff I try to get from participants include:
-name (and variants including singular/plural; male/female; special names for a big / little / young / old specimen, etc.)
-other names (synonyms or near-synonyms) for the same plant
-uses (and this can be expanded: for example – if it’s used as medicine, how is that medicine made?)
-description (how is it identified or understood by members of the speaker community? For an animal example, it’s common among the people I work with to differentiate a sheep from a goat by the fact that a sheep’s tail hangs and a goat’s tail stands up)
-natural history (what is the plant’s behaviour?, were does it grow?, when does it fruit? etc.)
-cultural connections (are there songs or stories (etc.) which mention or feature this plant?)
In terms of functionalities, I think it’d be great if there were good clear images already built in to the tool (photos of the plant, close-ups of the leaves, fruit / flowers, etc.), but also a way to upload images of the plant (or products made from the plant) taken in the field
In terms of devices – portability would be key: the ability to use it via smart phone would be easiest, I’d guess, but tablet might be easier for viewing media

I’d like to hear what other people think and have to say about this!

ping: Ahmed Sosal @Sosal, who I know just recently conducted some research on plant names with speakers of Iraqw


Chadic languages would sure be nice addition for Tsammalex. So once the data curation part is worked out, you’d have the “publication outlet” already there :slight_smile:


Regarding data collection, it might be useful to come to the field “prepared”, e.g. with a list of species that occur in the area as can be obtained from GBIF. GBIF also often provides publicly available images and vernacular names in several languages for species which might help with elicitation.

Another advantage of such a list from GBIF is that Linnean names are already there and - even better - GBIF classification IDs, which also solve the problem of alternative species names, conflicting classifications, etc.


Btw.: One of the original ideas of Tsammelex was as a tool to create field guides to help with further data collection. Thus, it incuded functionality to compile documents like this one for Afrikaans.


Thanks for all the interesting observations!

So below is a slightly fancied-up laundry-list of relevant data types that have come up so far. (I hope it’s okay that I have re-used and in some cases slightly recast your list above, @Andrew_Harvey!) For several of these, I have tacked on an italicized example from the Lusekelo’s video.

1. Word

  1. Form Hadza
  2. Glosses/Translations Kiswahili, English…
  3. Grammar grammatical features singular/plural; male/female…
  4. Related words
    1. Special names (E.g., big / little / young / old specimen, etc.)
    2. Synonyms or near synonyms Alternate in-language names for the same plant
  5. Source (Lusekelo’s Linguistic status) Native Hadza word or borrowed from Datooga, Isanzu…

2. Etymology/source words

  1. Source language Isanzu, Datooga…
  2. Source form
  3. Source gloss
  4. Links [E.g., URL for Datooga documentation project, Tsammalex]

3. Botanical documentation

  1. Botanical (Linnaean) name
  2. Image(s) of the plant
  3. Natural history (what is the plant’s behaviour?, were does it grow?, when does it fruit? etc.)
  4. Links to botantical reference sources GBIF, Wikipedia, Wikispecies…

4. Ethnobotany

  1. Uses (= Lusekelo’s Utilities)
    1. Medicinal How is that medicine made?
    2. Food Is the plant prepared or cooked in some way? Are there particular recipes that involve the plant?
  2. In-language taxonomy Does the plant fit into a taxonomic system (or systems) in the language? What features are relevant?
  3. Description How is it identified or understood by members of the speaker community?
  4. Cultural connections
    1. Songs which mention or feature this plant
    2. Stories, myths, etc which feature the plant
  5. Location where is the plant found? (could be lots of ethical issues here!)

Look at all this great stuff! And note, crucially, that almost any of these fields might generate documentation outside of the botanical documentation project itself. So for instance, when someone says “Oh yes, there is a story about this plant…”, or “You know, there is a complicated process we use to make this into medicine…” then there is the opportunity to record a narrative. Documentation is all about “tasks” like this. We should design interfaces accordingly.

Thinking about interface design

I think I have linked this article before (it now only survives in the Internet Archive :expressionless:):


I think it’s a great start in thinking about designing a user interface — very simple and approachable for just about anyone.

Our list above is the “list your bits” part of the 37signals article: we’ve approximated a list of all — okay, most of — the kinds of information that might come up in the task of documentating botanical information.

But a laundry list of data types is only the first step in designing an application — the next suggestion in the article is to consider grouping and prioritization. If you don’t, you end up with this:

:imp: Oh I know, just make an input for each field, like this!

Or even worse, a spreadsheet with a million columns that will leave you in a hellscape of eternal scrolling and tabbing. :scream:

So much no. That’s horrible! Starting with the idea that we “just” add a field for every possible datum is a recipe for disaster. Especially if you’re trying to use the thing in the field.

I’m gonna stop because this post is getting too long, but I do want to talk more about interface design for this project… hoping someone is interested :sweat_smile:


wow Tsammalex looks great! Another win for CLLD!

1 Like

The input for Tsammalex is pretty much a CLDF Wordlist - e.g. Heath’s Dogon data, with the twist that parameters (aka concepts) are mapped to the GBIF backbone taxonomy - see the GBIF_ID here.
So data curation via spreadsheets might not be completely unfeasible.

The first instantiation of Tsammalex was a wiki installation, though, which makes adding images to entries a bit simpler, as well as “data normalisation as you go”.

1 Like

Jonathan Amith has a number of ethnobotanical documentation projects in southern Mexico. The interface is under development, I think, but you can get an idea at: Documenting Ethnobiology in Mexico and Central America Home
By clicking on various things you can access photos, linguistic info, audio recordings, etc.

1 Like

Wow, this is super cool, thanks for sharing!

I love the fact that there are time-aligned texts alongside the database, e.g.:


There are some interesting screenshots that show what the entry page looks like in the User Guide:


There are some very detailed fields here.

Looks like they are building on top of another open-source tool called Symbiota:


The search interface is also very detailed:



By the way @yuni, I just noticed in your self-intro that you have an interest in botanical stuff yourself.

Now everyone can ask you questions :stuck_out_tongue: :potted_plant:

Ha, well spotted. The only thing I can think to add to the fieldwork interface discussion is that some field for higher-level classification could be useful, to enable users to pull up groups of related plants. Botanical family is almost definitely useful, and different languages might have culturally specific classifications.

For botanical documentation proper, i.e. to create a record that could be validated for entry into one of the feeder databases to GBIF as mentioned above, you’d have a lot more fields, but most of them wouldn’t really be directly relevant for language documentation purposes. Most such databases minimally want to know who recorded the occurrence of the plant, a GPS reading (typically, restricted access can be set in sensitive cases), and the date of the observation. There might be optional fields about life cycle stage, number of individuals, habitat, etc.

As you (@pathall) allude to in the opener to this thread, I agree that it would be possible to create questionnaires to help stimulate people to talk about plants for documentation projects. Before I got into botany/ecology, I used to ask very general questions like, “What plants are there around here? Tell me about them. What do you use them for?” with mixed success. Later I found it was possible to extend these “texts” into longer interviews by following up with questions about specific parts of the plant, life cycle, associated (e.g. invertebrate) species, etc. - nothing specialist, but just stuff that just didn’t used to pop into my head immediately in the moment when I hadn’t been spending time thinking about ecology.

Back to the database thing, I think botanical (Linnaean) IDs should, by default, be treated with caution. Common names often correspond to a genus rather than species. Some species-level IDs are pretty straightforward, like if there’s definitely only one in the genus in that part of the world and it’s very well known, etc. Then there is a continuum of species-level ambiguity that reaches down to cases where there are so many subtly different species that it is 0% worth it (IMO) to write anything other than “[Genus] agg.” So, often, previous sources (like dictionaries) may be full of educated guesses and approximations if the research was not primarily botanical in nature. So maybe a field for ID notes could be a place to record rationale, doubts, context, etc. in order to give future users an idea of the types/extent of ID effort that was made (e.g., “looks similar to photo on p.XX of Author (Year) so I’ve copied that scientific name”; or “longer sepal than petal suggests species X rather than species Y but there were no seed pods yet so I wasn’t able to dissect one to confirm”).


Yes, any comments on how something like an associated GBIF ID was determined should go into a separate field. I should note that GBIF has IDs for all taxonomic units in its backbone taxonomy. So if you want to link on the genus (or sub-species) level, that’s entirely possible. The worst compromise from a data reuse POV is stuff like “?” or “1234 genus” in ID fields :slight_smile:

1 Like

Thank you @Andrew_Harvey for tagging me in this interesting thread.

The information shared here is very valuable to me. I tried to collect data on plants from a number of languages spoken in the Rift Valley area in Tanzania earlier this year. I had many questions concerning (1) the collection method in the field, (2) the dilemma of information (linguistic, ethnobotanical, ecological, historical, etc.), (3) the appropriate multidisciplinary research method to handle such data, and (4) the data/results dissemination (publications, interactive responsive website, apps?).

Thanks to everyone for the suggestions made here. I will definitely refer to them when I start working on the data.