Vlaamse Radio- en Televisieomroeporganisatie

Automating the creation of VRT news summaries by using Natural Language Processing

Impact

The AI driven model is a powerful tool in assisting journalists in their writing work by speeding up the summary creation as well as suggest alternative wordings. In that way, human creativity and AI can enhance each other’s complementary strengths and produce high quality results.

Intro to the customer

The VRT is the national public-service broadcaster for the Flemish Community of Belgium. With its three television channels, five radio stations, and various digital channels, the VRT reaches up to 90% of all Flemish people every week. The VRT NWS is the news service of the VRT and aims at keeping Flemish people informed about the national and international news through its diverse channels (such as the website, application and live tv broadcastings). The VRT has also an innovation department which continuously explores new technologies and applications for media purposes in close collaboration with its end-users.

Challenge

News articles on the VRT NWS website generally consist of a short summary followed by the entire article. The summary acts as a condensed version of the article and captures the main points of the story. Because the creation of summaries is a repetitive and time-consuming process, the VRT innovation department explored the possibility of using Natural Language Processing to automate this activity. Two approaches can be used for that: extractive and abstractive summarization. The extractive summarization focuses on identifying the most important parts of the article and produces a set of sentences from the original text, while the abstractive method produces a new text based on the interpretation of the article. The VRT chose for the abstractive method, which is a more state-of-the-art method and provides more promising results. In that way, the innovation department sought to develop trained models to enable the automatic creation of news summaries. Although they had already fine-tuned their models, they were not satisfied with the quality of the output. Therefore, ML6 provided in-depth technical advice on the training and on the deployment of the models to achieve the highest performance possible.

"Working with ML6 is investing in our own people. We believe it’s important that we have internal knowledge, and through our collaboration we received a knowledge transfer in a very efficient way to bring our people to a higher level. We buy knowledge, we buy flexibility, we make an investment in our people towards the future."

Combine Automation Lead Engineer

By

CNHi

Solution

To enhance the training of the models and the quality of the final results, ML6 proposed a sequential way of working: starting from a pretrained multilingual NLP “BART model” (see paper), then doing a first fine-tuning phase on English news summaries translated into Dutch, and finally integrating VRT data in the model. This way of complementing the client data with processed open-source data leads to a higher quality of the generated summaries.

By

Results

This type of Transformer model is a powerful tool in assisting journalists in their writing work. They can speed up the summary creation as well as suggest alternative wordings. At the same time, they can not replace the insights and the creativity of the journalists; which are human skills gained on a broader scale than the (limited) training data presented to the NLP model. With this in mind, it can be said that humans and AI can enhance each other’s complementary strengths and produce high quality results. As a final test of this theory and in order to evaluate the output quality of this tool, a journalist from the VRT tested some of the news summaries based on existing VRT articles, which enabled the VRT innovation to gain new insights and feedback.