How to label your way to accurate Automatic Speech Recognition (ASR)
Introduction
In this blog post explores the process of labelling speech data for Automatic Speech Recognition (ASR). ASR is the process of transcribing spoken language into text, and it requires large amounts of unlabelled or weakly labelled speech data to train the models. However, pre-training ASR models on this type of data can lead to errors and biases, so labelling speech data is crucial for accurate and robust performance. The post covers different types of speech data, labelling methods, quality control techniques, and annotation formats used to train ASR models. It also includes information on data formats, speaker diarization, file duration, data augmentation, and the pros and cons of different labelling methods. Understanding the type of speech data used and developing an appropriate labelling methodology are essential for building a perfect ASR model.