Where are we at with speech to text these days?

I got a query from someone I work with in Australia about audio searching. He has a large collection of songs recorded with family members, and would like to be able to search for particular words. As far as I know there are no transcripts. I remember seeing some papers about transcription without pretraining from early in the pandemic but haven’t seen anything recently, and it wasn’t clear what was needed (presumably songs are going to be extra-tricky )

3 Likes

From what I understand, this is the state of the art in speech-to-text for low-resource languages: Welcome to the Elpis ASR documentation! — Elpis 1.0.6 documentation.

It still requires transcripts, though (in the form of ELAN files).

3 Likes