Dynamic E-language learning

Project Description

With the increase of globalisation and the current rise in numbers of people moving abroad, learning a new language in an affordable and fun way is becoming more and more important. Parlangi is an e-learning provider which connects native speakers and learners of a specific language. They are invited to use a platform for video calls to talk and improve the language skills of the language learner.

As of now, Parlangi provides different topics of conversations every 10 minutes to engage the speakers. Although this approach is relevant to keep the conversation going, it does have a few potential drawback as it can either be:

  1. Intrusive: A disruption of the flow of conversation can occur if both speakers are engaging in the topic of conversation and a topic switch pops up.
  1. Infrequent: It might preferable to suggest a topic switch early on if the topic of conversation is not engaging both speakers (dominated by silence) or the topic is engaging only the native speaker (which defeats the purpose of the video call to improve the non-native speakers' speaking skills).

Goal

The goal of this project is to enhance the conversation topic suggestion feature and make it more dynamic. This can help provide a more fun and satisfying experience for the users of the platform. 

In order to do so, both the frequency and duration of silence of the conversation as well as the speech frequency of the individual speakers need to be determined. This information can be used to quantify the overall level of engagement of the speakers and suggest conversational topic switch at an appropriate timing.

One method of approaching this problem is by applying speaker diarization techniques on the raw audio recording. Speaker diarization aims at answering the question “who spoke when”. With that, it is feasible to detect both the ‘speech’ moments of the individual speakers as well as the ‘silence’ segments as illustrated below.

This internship isn’t only a great way to leverage your skills working with audio and edge computing but also to do good. Your internship can be rounded up with a blog post where you share your learnings and how you helped Parlangi and its users by improving their experience of learning a new language.

Methodology

You can take a headstart when working on this project, as some work has already been done. There exist many diarization libraries that already implement diarization pipelines to diarize audio recordings. Some initial exploration of those libraries has been done by ML6. However, there is still much work to be done to put this tool in practice. 

  • An appropriate diarization framework has to be chosen for this task. Different trade-offs in accuracy/detection speed/resources need to be considered.
  • An extension tool has to be implemented to access the audio stream from the open source video platform.
  • The results of the diarization algorithm will be used in a control loop algorithm that proposes conversational topics dynamically.

During this internship you will:

  • Explore multiple state-of-the-art diarization algorithms.
  • Implement an extension tool to access the raw audio data from the video platform. 
  • Develop an end-to-end solution for dynamic topic suggestion and integrate it with the Parlangi platform.
  • Write a blog post summarising your work..
  • Do some good!

Profile / Required skills

  • Strong analytical abilities, knowledge of different statistical methods and a familiarity with research studies.
  • Working experience in Java development to build a tool that interfaces with the video platformi.
  • Strong interest in Speech/Audio processing [preferred].
  • Familiarity with  tools like Python.
  • Excellent verbal and written communication in English.
  • You are currently pursuing a degree in computer science or related field.

Internship Duration

The duration of the internship can be flexible and depends on the candidate preference and the project requirements. The estimated duration for this specific project is 6-8 weeks:

  • Week 1: Getting familiar with SoTA diarization algorithms and the open source video platform.
  • Week 2-3: Build a tool that interfaces with the video platform to get an audio stream.
  • Week 4-5: Integrate the diarization algorithm with the audio stream and build the control flow logic for the dynamic topic suggestions 
  • Week 6: Validate the results of the algorithm and write a blogpost 

Chapters

Our internships and theses are linked to our chapters. A chapter is a cross-squad team of experts in a specific topic to enable knowledge building and sharing across projects. The chapters build knowledge by performing applied research and gathering learnings from projects. This internship falls under the Speech/Audio working group which is part of the  Natural Language Processing (NLP) chapter.

Supervisors

Thomas Dehaene: Chapter Lead

Lisa Becker: Machine Learning Engineer and Speech Working Group Lead (daily supervisor)

References

  • ML6 diarization algorithm exploration [link]
  • Diarization libraries [link]