Model performance has increased dramatically over the last few years due to an abundance of machine learning research. While these improved models open up new possibilities, they only start providing real value once they can be deployed in production applications. This is one of the main challenges the machine learning community is facing today.
Deploying machine learning applications is in general more complex than deploying conventional software applications, as an extra dimension of change is introduced. While typical software applications can change in their code and data, machine learning applications also need to handle model updates. The rate of model updates can even be quite high, as models need to be regularly retrained on the most recent data.
This article will describe a general deployment pattern for one of the more complex kinds of machine learning systems to deploy, those built around embedding-based models. To understand why these systems are particularly hard to deploy, we’ll first take a look at how embedding-based models work.