This blogpost is part of a series of blog posts:
This post is a follow-up on the first part. In the first part we saw how Kubeflow’s TF-jobs could be used to run several training jobs on Kubernetes to train different machine learning models. In this follow-up we will take the same use case but run a TF-job on a GPU to reduce training time.
The Kubeflow deployment automatically creates a gpu-pool on the Kubernetes cluster which can scale based on demand so you only pay for what you use. The other nice thing is that Kubeflow handles the Nvidia driver installation for us so we only need to worry about our machine learning model.
Please note that it’s not a pre-requisite to complete the steps in the first blog post in order to be able to follow the steps outlined below. The code used for this blog post is available on Kubeflow/examples repository.