Kubeflow is an open-source project dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Kubeflow allows to investigate, develop, train and deploy machine learning models on a single scalable platform. The underlying resources are abstracted away so the same deployments will work on your laptop, on-premise hardware, and your cloud cluster.
This blogpost is part of a series of blog posts:
In this first blogpost, we will work through the exploration, training and serving of a machine learning model by leveraging Kubeflow’s main components. The image below illustrates how Kubeflow’s components cover the end-to-end lifecycle of an ML product. Note that Kubeflow has many other components in the eco-system of an ML product (e.g. hyperparameter search with katib) but in this blogpost we will focus on the core parts.
As an example, we will use the Machine Learning with Financial Time Series Data use case. The premise of this use case is straightforward: financial markets are increasingly global, and if you follow the sun from Asia to Europe to the US and so on, you can use information from an earlier time zone to your advantage in a later time zone. The objective is to predict whether the S&P 500 index will close positive or negative, based on information from other stock markets. In reality, the situation is more complex because there are commissions and taxes to account for. But as a first approximation, we’ll assume an index closing up indicates a gain, and vice-versa.
The code used in this blogpost is available on the Kubeflow examples repository. You can use a Google Cloud Shell to follow the steps outlined below but make sure to install 1/ ksonnet via these instructions and 2/ uuid-runtime via the command sudo apt-get install uuid-runtime.
Alternatively you can work from your local environment, but in that case you will have to make sure to install the necessary requirements that are mentioned in the pre-requisites section on the repository. Independent of the machine that you are using, you will need access to a Google Cloud Project and its GKE resources.