The AI driven model is a powerful tool in assisting journalists in their writing work by speeding up the summary creation as well as suggest alternative wordings. In that way, human creativity and AI can enhance each other’s complementary strengths and produce high quality results.
The VRT is the national public-service broadcaster for the Flemish Community of Belgium. With its three television channels, five radio stations, and various digital channels, the VRT reaches up to 90% of all Flemish people every week. The VRT NWS is the news service of the VRT and aims at keeping Flemish people informed about the national and international news through its diverse channels (such as the website, application and live tv broadcastings). The VRT has also an innovation department which continuously explores new technologies and applications for media purposes in close collaboration with its end-users.
News articles on the VRT NWS website generally consist of a short summary followed by the entire article. The summary acts as a condensed version of the article and captures the main points of the story. Because the creation of summaries is a repetitive and time-consuming process, the VRT innovation department explored the possibility of using Natural Language Processing to automate this activity. Two approaches can be used for that: extractive and abstractive summarization. The extractive summarization focuses on identifying the most important parts of the article and produces a set of sentences from the original text, while the abstractive method produces a new text based on the interpretation of the article. The VRT chose for the abstractive method, which is a more state-of-the-art method and provides more promising results. In that way, the innovation department sought to develop trained models to enable the automatic creation of news summaries. Although they had already fine-tuned their models, they were not satisfied with the quality of the output. Therefore, ML6 provided in-depth technical advice on the training and on the deployment of the models to achieve the highest performance possible.
To enhance the training of the models and the quality of the final results, ML6 proposed a sequential way of working: starting from a pretrained multilingual NLP “BART model” (see paper), then doing a first fine-tuning phase on English news summaries translated into Dutch, and finally integrating VRT data in the model. This way of complementing the client data with processed open-source data leads to a higher quality of the generated summaries.
ML6 also provided custom learnings and recommendations on the use of NLP for text summarization specifically for VRT and identified some future use cases in the context of media, such as podcast summarization or event detection. Besides that, ML6 helped with the creation and the management of their datasets and the possible extension areas. MLOps practices about model version management and data management were also shared with the VRT innovation department. Finally, ML6 shed light on the use of Google Cloud for the training of such models.