The Work Before the Work

Sam Berg, projectresearch
Back

Before I start coding, there are a couple things I need to iron out.

As a person who can get side-tracked, I need to define the scope of this project. What I intend to make is a program that takes in live audio and outputs text. This text can go to the console or to a text file, but I'm not interested in making a GUI, at least not at this point.

Additionally, it's worth noting that with machine learning applications, you define models and train them with data. Then you use that model to create content (eg. text to speech) or - in my case - interpret content. I am trying to get this app up and running ASAP, so I'll try to use pre-trained models if possible. Finding the big data sets required to train models is a pain in my experience, and it's an exercise I don't enjoy.

Next, I need to answer some of the high-level questions about making a speech-to-text, heretofore STT, application:

Open Source Speech to Text Applications and Libraries

Thoughts

I'm gonna start with using Mozilla DeepSpeech. DeepSpeech has great documentation and is written in Python, which I know a hell of a lot better than C++. There are plenty of other differences between the applications I listed, and plently of other applications that could be on that list. I just want something that's easy to use, and DeepSpeech seems be the easiest.

Post Research Discoveries

Finding datasets for speech recognition isn't as hard as I'd expected. The OS tools I found have links to plenty, including:

Terms

During research, I found some terms I didn't understand

© Sam Berg.RSS