Writing Out Loud: Abkhaz Converbs and Getting Data to Do Stuff

Hello All out there on DocLing and beyond!

This is the abrupt beginning to my thread about adventures during my PhD project which will describe converbs in Abkhaz (Northwest Caucasian).

For now, the focus of this thread is specifically on me developing my corpus of converb examples, which includes trying to develop of feature spreadsheet for examining the characteristics of Abkhaz converbs (and, really, converbs in a typological sense) and trying to get the most out of what software/coding can help me see about my data. Both are things I began thinking about in 2021, the first year of my PhD, however, I’m renewing focus on them now.

I did a bunch of things in 2021, but what I especially learned is that I need to guard the time I devote each day to reading and writing that is directly related to my project. The first two hours of my workday will now be devoted to this endeavor, and to keep myself accountable, I’ll post my musings here. And perhaps, occasionally, people will be interested in sharing their thoughts and reactions.

Thoughts for today, January 3rd, 2022:

  1. How do I break features down into discreate units? For instance, some of the examples I’ve gotten from the literature show things like “converbs formed with the -ны suffix and the stem of a static verb (and/or the pure stem of a static verb) to indicate additional state with a verbal predicate.” Not only does the example show morphological form (ны suffix, verb stem) and then a more “syntactic” form (converb + predicate verb) but also semantic meaning (indicating additional state of the syntagma). There’s a lot in here, and all of it is relevant.

So far I started experimenting with two breakdowns for features to describe one example (one row = one column in Excel):

  • feature: verb stem

  • value1: affixation

  • value1_name: affixed vs. unaffixed

  • value1_as_in_source: unaffixed

  • value2: dynamicity

  • value2_name: dynamic vs. static

  • value2_as_in_source: static

  1. I think what I need a program for is to set filters so that I can search for “examples with values X and Y” and then it brings up all the examples in the corpus that fit those filters

Even if it’s only ever me who reads my own musings, I think it’ll help, so thanks a lot to DocLing for existing and hosting :wink:

*The feature spreadsheet I want to make for this project and for other converb projects in the future (converbs are life) is based on the one developed by the Typological Atlas of the Languages of Daghestan which you can find here: Feature datasets

*Thanks to @pathall for suggesting I start a thread (you don’t have to comment, just wanted to credit you properly).

2 Likes

Thanks for sharing this! I love the idea of taking notes on docling, I hope you keep it up!

We aim to serve. :slight_smile:

I can assure you that at least I and perhaps others will read everything you write with interest!

(I have some thoughts on what you’ve already written but I’m in high-gear-finish-the-dissertation mode so they’ll have to wait!)

Also, I got you this.

:grin:

1 Like

Winnie the Pooh… viːni pɔx?

:joy:

1 Like

Thanks for the encouragement, Pat! (And for the Abkhaz video :smiley:). I’m determined to keep it up at least until I send in my potential paper (hopefully longer – but smaller goals first). Whenever you finish your dissertation, you can give me notes :spiral_notepad::+1:

I’ll have to see what I think is going on with Pooh’s name as well! xD

1 Like

Thoughts for January 4, 2022

Today I remembered that every single action in trying to put together a feature spreadsheet brings more questions. During the first year of my PhD (the late 2021), developing the feature spreadsheet was something I always had in the back of my mind but hadn’t started yet because, I think, it was exciting but overwhelming. This year, though, it still is a big undertaking, but I’m feeling more excited than overwhelmed. (I’m feeling a little bit behind in my goals though since I feel like I should have plunged right into it last year, but I’m not indulging the negativity).

Anyway, today I experimented with putting a non-NWC language example (of a converb) into my spreadsheet. I also tried to add a few more “features” that I have on a Big List of Stuff to Consider When Analyzing Converbs. This brought a ton of practical and theoretical questions which I have on, basically, a digital piece of paper. (Surprisingly useful though).

Some of thoughts for today:

  1. I want the feature spreadsheet to be useful for studying converbs cross-linguistically, but in terms of the scope of my project, it might be more practical to make as detailed a spreadsheet as possible for my own purposes (examining Abkhaz converbs from written material and, eventually, data collected during fieldwork) because there were a bunch of questions that came up trying to put an example outside this scope, like “What do with different transcription orthographies (including IPA but also idiosyncratic ones)?”, “What do with people’s glossing conventions?”, “What do when people use the same terms with slightly different meanings in different works?” Not that I won’t come across this for sources on Abkhaz, but like, maybe one area at a time is good xD

This ties into the overarching question

  1. What is my base – the thing I’m looking at? What data do I want to collect and what do I want to learn from me? This is something I’m still working out but at least I know the question is there.

And then there’s all these practical things that have to do with the spreadsheet design:

  1. If I use more than one “feature” in the spreadsheet (i.e. One spreadsheet – all things), what naming conventions should I use with the values? So far, I’m doing “feature 1” then “value1_1” (so value 1 of feature 1), “value1_2” (value 2 of feature 1). I’ve done this because I’m assuming I want every column to have a unique name because I have a vague sense this is important for the code to be able to distinguish the second value of feature 1 from the second value of feature 4.

  2. If I am trying to break features down into values that can have a unique, discrete answer (like “yes” or “affix” when the two possibilities are “affix” versus “no affix” or something, how do I distinguish between feature values that are possible/impossible versus features that are manditory/optional. For example, today I worked with the feature “coreference.” In some situations, Same Subject coreference might be optional while in other situations it may be impossible which looks similar to optional but is not the same.

  • it would be really neat if I could make a program go through a set of options like on those maps titled “Are You a Cat?” and then it goes to the first question that could be “You purposely knock objects off tables” and then depending on whether the answer is “yes” or “no” it takes you to another question or set of questions. I think this would be a useful way for a program to go through something like “Is SS coreference possible? Yes. Is it manditory? No.” Or something along those lines.

I’ll finish off this already long post with converb specific things:

  1. Certain features, like coreference, assume the converb has its own clause. However, morphological converbs are also found in complex predicates and don’t necessarily have their own clause. How do I deal with the different possible functions of converbs in the dataset?

  2. How am I going to define “converb.” Good question.

1 Like

Thoughts for January 5, 2022

So, quick post today. Two of the things I was thinking about today include:

  1. Maybe I could use some kind of software to “look through” all the different resources I have to help me locate examples with converbs more quickly. A lot of the resources I have are PDFs (most are probably computer readable?). I could have the program search for particular glosses (or maybe even particular strings of characters – that are not English – but I don’t know how well that would work). I have a whole bunch of texts like grammars and readers and legit text collections to look through. And maybe there could be a way to have it search some of the databases that have Abkhaz texts online as well?

  2. The dataset I think I want to generate will be bunches of example sentences with converbs in them that are analyzed for a bunch of the features that are important for converb analysis (vague, I know xD).

  3. The features I want to look at come at several levels (sentence-level, word-level, morpheme-level). I’m not sure if this is important for how I structure my spreadsheet or not… but I keep bumping up against it.

1 Like

Hey would you like to share a handful of the kinds of example sentences you’re working with? Let’s talk about the shape of the data — converbs are particularly interesting because they are multi-word constructions, which can be tricky to handle in a spreadsheet.

1 Like

@mjcarroll’s talk touches on searching through a library of pdfs. It’s well worth a watch if you haven’t yet!

2 Likes

Thanks! These talks actually inspired a lot of my thoughts about data, so I can definitely agree they’re well worth watching :wink:

100% yes about being multi-word constructions. And the other thing that’s interesting is that there are morphological considerations in terms of form but then also aspects related to function in a clause/sentence both syntactically and semantically, plus a bunch of other things. xD

So! Right now I’m focusing more specifically on Abkhaz data (because I want to use the sheet for my fieldwork), but the larger goal is to make it applicable for converb data cross-linguistically. Just to frame where I’m at.

I thought it would be more illustrative to should you what the data looks like in my spreadsheet (not my feature spreadsheet, but the spreadsheet I’m transcribing data out of written texts into)

Column A: written Abkhaz
Column B: Russian translation (as given in monograph)
Column C: English translation (right now with the help of DeepL)
Column D: my eventual (superior) English translation (:P)

Column H: Where I record what the example is given for in the manuscript.*

*The first example is “use of the verb stem as a converb without temporal or other special affixes” which is a bit more straightforward. But then you get reasons like this, “converbs formed with the -ны suffix and the stem of a static verb (and/or the pure stem of a static verb) to indicate additional state with a verbal predicate,” which, as you can see, cover several things at once

Anyway, this is where my data is currently living as I’m transcribing it out of PDFs and into the Excel sheet. (A lot of examples still live in PDFs and computer-generated texts, which is why I’m thinking it might help to use a program to search examples out going forward :thinking:)

Ultimately, my idea atm is: put examples in searchable spreadsheet with relevent information, analyze examples according to my own spreadsheet, have them be in a database where I can use different filters to search for examples which show, like, “static stem converb with SS coreference” or whatever. xD


Thoughts for January 10, 2022:

Today I was thinking about some potential paper topics and skimmed through at least one sketch grammar of Abkhaz (potentially more after this post) and took note (not for the first time – but in a different light) that “static vs. dynamic” verb stems and “transitive vs. intransitive” verb stems are aspects that come up a lot at least for linguists talking about converbs in Abkhaz (and probably other languages).

I also noted that they do use the term “converb” but they don’t necessarily define what they mean (not unusual). Some of the characteristics they give to describe them are morphological in nature but occasionally also functional (syntactic and semantic). There is at least one linguist that advocates for taking a functional approach to identifying converbs rather than a morphological one.

1 Like