Model performance has increased dramatically over the last few years due to an abundance of machine learning research. While these improved models open up new possibilities, they only start providing real value once they can be deployed in production applications. This is one of the main challenges the machine learning community is facing today.
Deploying machine learning applications is in general more complex than deploying conventional software applications, as an extra dimension of change is introduced. While typical software applications can change in their code and data, machine learning applications also need to handle model updates. The rate of model updates can even be quite high, as models need to be regularly retrained on the most recent data.
This blog post is a follow-up on the article about a General Pattern for Deploying Embedding-Based Machine Learning Models. Embedding-based models are hard to deploy since all the embeddings need to be recalculated, all while ongoing traffic is not interrupted and shifted smoothly over to the new model. In this article, we introduce a set of tools and frameworks — Kubernetes, Istio and Kubeflow Pipelines — that allow you to implement this general pattern. It should be noted that this is just one way of doing it. There are plenty of viable practical implementations possible, you just need to figure out what works best for your team and application.