Blog

Thoughts on the latest
in AI

 

 

This is where breakthrough ideas emerge and your inner innovator is awakened. Get inspired by the best of ML6's insights and the minds shaping the future of AI.



Selected:
  • Person listening to sound

    The power of custom voices in AI agents

    AI is speaking. The question is: Does it sound like you? TL;DR — Custom AI voice models help businesses create consistent, brand-aligned voice experiences across virtual assistants, chatbots, and customer support. This guide explores how voice branding drives trust, recognition, and differentiation—plus how to implement and optimize synthetic voices ethically using voice cloning, TTS, and neural speech technologies. Voice is your brand—own it.

  • fine tuning whisper for dutch language

    Fine-tuning Whisper for Dutch Language: The Crucial Role of Size

    OpenAI claims that Whisper achieves human-level accuracy and robustness in English Automated Speech Recognition (ASR) performance, but its potential can be further amplified through the process of fine-tuning. The blog post investigates in how far fine-tuning Whisper specifically for the Dutch language can lead to enhancements in performance. We explore the impact of fine-tuning different sizes of Whisper models using varying durations of audio data, namely 1 hour, 10 hours, and 50 hours.

  • Dj man

    Who spoke when: Choosing the right speaker diarization tool

    Introduction This blogpost is derived from its interactive version on Hugging Face Spaces. You can continue reading there if you want the benefits of playing around with multiple examples or to test some diarization tools on your own audio samples.

  • Sound waves

    How to label your way to accurate Automatic Speech Recognition (ASR)

    Introduction In this blog post explores the process of labelling speech data for Automatic Speech Recognition (ASR). ASR is the process of transcribing spoken language into text, and it requires large amounts of unlabelled or weakly labelled speech data to train the models. However, pre-training ASR models on this type of data can lead to errors and biases, so labelling speech data is crucial for accurate and robust performance. The post covers different types of speech data, labelling methods, quality control techniques, and annotation formats used to train ASR models. It also includes information on data formats, speaker diarization, file duration, data augmentation, and the pros and cons of different labelling methods. Understanding the type of speech data used and developing an appropriate labelling methodology are essential for building a perfect ASR model.‍

Newsletter

Stay up to date