STT with little data

NLP - ML - Audio - Speech to text - Transcription

Background

Customer data is often at the heart of any project start. But whereas textual and image data are typically easier to collect and label, audio data is often more scarce and tricky.

This typically results in a few rounds of “back and forth” at a project start with typical questions such as:

  • How much audio data do you need
  • How much of that should be labelled
  • How should we label the audio data
  • Etc.

In this internship, we aim to answer those questions to as specific a degree as possible. For this, you will focus on creating a speech transcription engine that can identify and accurately transcribe your colleagues in various contexts. In this process, you’ll identify key relationships between data quantity, quality, type and model accuracy. You’ll then package all of this into a demo which can take various forms for your fellow agents and the world to use! 

This, dear ML6 Intern agent, is your mission.

Goal

The goals of this internship are as follows:

  • Identify the relationship between quantity of data vs accuracy of transcriptions on new data.
  • Identify data qualities (if any) which improves transcription accuracy the best.
  • Quantify the utility of hotwords in domain adaptation of speech transcription models.
  • Identify the differences between the above relationships for English and a lower-resource language (in this case, Nederlands).
  • Use all these learnings to create a small demo (web app, smart assistant, chrome extension …)  for ML6 and the world to use! 
  • Give a kick-ass presentation to your fellow agents and look cool while doing it!