Hello,
I am new to this forum, and looking for recommendations.
I completed six years of fieldwork on Shangan Makhuwa, a dialect of Makhuwa spoken on the Northeast coast of Mozambique, in 2010. Since then, I have experienced many personal obstacles and interruptions, but am now free to return to my work.
In addition to producing a dictionary and grammar on the Shangan dialect, I have also produced over 380 hours of transcribed data, with the aid of six field assistants. I believe the best way to analyze all of this will be via the fields of corpus and computational linguistics. However, I have never worked in these fields before, and so I thought that I would ask if anyone here could recommend a program or programs that I could set up to process data on a little-studied language. It will be an evolving methodology, looking into such things as collocations, topic modeling, and semantic fields, so I would need something fairly open and flexible. I will also continue adding to my dictionary and grammar as I work though the transcripts. All of the transcripts were derived from high-quality WAV files, so I could also go back and reanalyze say, greater phonetic detail, or time alignment, if necessary.
So far, I have come across this book:
“Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy and Keras.” By Bhargav Srinivasa-Desikan.
As well as the program AntConc.
If anyone has any suggestions along these lines, or can recommend someone with experience in using these kinds of programs, that’d be greatly appreciated! It would be nice to be confident of my options before committing much more time to them.
Thank you for your time and attention!
Erik