August 31, 2021

Vertex Pipelines — Vertex AI vs AI Platform

Liam Campbell

Data Engineer

Recently, Google unveiled their latest offering in ML Tools on their Google Cloud Platform, Vertex AI. In brief, the new platform seeks to combine the tools offered previously by separate services on GCP, such as AI Platform and AutoML, into a single service. Integrating these previously separate services brings benefits to users by giving them the ability to interact with all these tools using a single set of APIs, SDKs, and Clients.

For a more detailed overview of the aims and features of Vertex AI as a whole, check out this previous ML6 Blog Post. In this blog post, we will focus primarily on Vertex AI’s answer to AI Platform Pipelines, Vertex Pipelines. ML Pipeline technologies such as Vertex Pipelines are important to any MLOps Team as tools for orchestrating the many moving parts of complex training and prediction jobs at scale. They are a key piece of the infrastructure that brings Machine Learning capabilities into a production setting.

‍

AI Platform Pipelines

‍

Previously in AI Platform, Google’s former Machine Learning platform, we had AI Platform Pipelines. This was a service aimed at making it easy to deploy Kubeflow Pipelines, the MLOps Pipeline toolkit from Kubeflow, to Google Cloud Platform resources. The workflow for deploying a Kubeflow Pipeline with AI Platform looked something like the following;

Step to deploying Kubeflow on AI Platform — Steps to deploying Kubeflow on AI Platform

‍

1. Set Up Kubernetes Cluster with Google Kubernetes Engine

‍

The first step in deploying Pipelines to AI Platform was setting up a cluster on which to host our Kubeflow Pipelines Client. Although here at ML6 would always caution our clients to automate their infrastructure with tools like Terraform, provisioning a cluster with GKE was made very easy via neat user interfaces in the GKE Console.

‍

2. Deploy Kubeflow Client to GKE Cluster

‍

With our cluster up and running, we could easily deploy Kubeflow Pipelines instances to it using the AI Platform Pipelines UI in the GCP console. Creating a new deployment was as simple as selecting your GKE Cluster from a drop down list and filling out a few pieces of configuration in a simple form. The default behavior was to use nodes in the Kubernetes cluster to host MySQL and MinIO services, Kubeflow’s default for Artifact and Metadata storage, but by providing connection details in setup GCS and Cloud Storage can be used as more scalable and reliable alternatives.

‍

3. Develop Pipelines with Notebooks

‍

With the cluster setup and the Kubeflow instance created we could use the Notebooks of AI Platform as secure development environments for working with the Kubeflow Pipelines SDK to develop our pipelines. In AI Platform we are simply using vanilla Kubeflow Pipelines tools on GCP resources, so all of the standard Kubeflow SDK features would work exactly as if you spun up and Kubeflow Pipelines instance on an on-premises Kubernetes Cluster.

‍

4. Manage Pipelines and Runs in Kubeflow Client UI

The developed pipelines could then be uploaded to the Kubeflow client, where you could see all previously uploaded pipelines, launch runs of these pipelines, and view the DAG and outputs of ongoing and completed Pipeline runs. Pipelines could also be uploaded, and runs started/ monitored via the Kubeflow Client API using functions defined in the python SDK.

This workflow made it very easy to work with Kubeflow Pipelines in Google resources, with deployment taking 5 minutes (if you don’t include the time it takes for GCP to spin up the resources in the background). Thanks to GKE, Kubernetes cluster management was as easy as it had ever been, and thanks to AI Platform Pipelines, deploying Kubeflow instances to those clusters was even easier! Despite this, ML Teams still needed to have Kubernetes skills in order to make informed decisions, properly configure their cluster, and generally make the best use of AI Platform pipelines and the GKE Cluster it would be deployed too.

‍

Vertex AI & Vertex Pipelines

‍

One of the first things one might notice moving from AI Platform Pipelines to Vertex Pipelines, is that this extrapolation of resource management away from the user has continued, bringing with it the usual reduction in day-to-day hassle managing configuration files.

A big indicator of this is that users are no longer required to create a dedicated Kubernetes cluster via GKE on which to run their Pipelines. Instead, Vertex AI employs an apparently serverless approach to running Pipelines written with the Kubeflow Pipelines DSL. Instead, the Kubernetes clusters and the pods running on them are managed behind the scenes by Vertex AI.

In the screen shot below, which shows the Vertex Pipelines UI, you start to get a sense for this approach. Instead of a store of pipelines and historic runs, as you may be familiar with if you’ve used the Kubeflow Pipelines UI before, we simply have a list of historic runs. Runs can be started by uploading a Job Spec compiled from a pipeline script, either via the UI or the Python Client. Here we start to get a feel for the ‘pipelines-as-a-service’ approach that Vertex Pipelines seems to be aiming for.

‍

This also hints at another key conceptual difference between the two tools; Vertex AI isn’t running an instance of a Kubeflow Client. Instead, Vertex Pipelines is its own version of the kind of infrastructure usually provided by Kubeflow Pipelines (ie, Container Workflow Orchestration), that can run pipelines specified using the Kubeflow SDK.

A key benefit of this new approach is that Vertex Pipelines makes great use of GCS for Artifact storage natively, and even employs its own metadata server in the form of Vertex AI Metadata. Having these managed services in place by default is definitely welcome, as in our experience, the default options in AI Platform Pipelines (Kubernetes nodes and PVCs hosting MySQL and MinIO services) don’t scale quite as well as their Google managed counterparts.

Another benefit that users will welcome with the new approach is the reduction in cost that is provided by the pay-as-you-go model that this ‘pipelines-as-a-service’ approach is able to deliver. Instead of paying for the continuous uptime of the necessary K8s Cluster, users will now only pay $0.03USD per run, plus whatever Compute resources the pipeline consumes while it is running.

‍

Kubeflow in Vertex Pipelines

‍

Given this new approach to implementing Kubeflow Pipelines in Vertex AI, there are some differences to note when developing workflows with the KFP SDK.

The first is that Vertex AI requires an entirely new version of the Kubeflow SDK, version 2.0. This SDK comes bundled with versions of Kubeflow Pipelines after v1.6, so with this version installed you are ready to start building SDK v2.0 compliant pipelines.

This new version of the SDK is designed primarily to make use of the Pipeline Metadata and Artifact tracking tools of ML Metadata, an open source Metadata tracking tool developed by the Tensorflow Extended team. Vertex AI implements its own version of this in Vertex ML Metadata, which makes use of the base TFX ML Metadata tool.

Whilst developing with the new version of the SDK will largely be the same as the traditional Kubeflow SDK, there are a few differences that one will need to keep in mind when working with the new standard.

First, concerning building components, KFP SDK v2.0 mandates that all component parameters be annotated with their data type. In addition, an extra distinction is now made between Component inputs that are parameters, and those that are artifacts. Component Parameters are those that can be passed as string, integer, float, Boolean, dictionary or list types and are therefore usually smaller pieces of data. Artefacts are larger pieces of data, for example datasets or models, and are passed instead as a path referencing the location of the Artifacts. Parameter values and artifact metadata can be viewed in ML Metadata.

The difference between Artifacts and Parameters is really specified within the component specification in the component.yaml files of our components. Below we can see a basic component.yaml file as it may have looked with the old version of the SDK. Below that, we have the component.yaml as it would look under the new specification.

‍

Old style component specification component.yaml file

‍

New style component specification component.yaml file

‍

Inspecting these component specifications carefully, one will notice that for input values in the ‘command’ portion of the ‘implementation’, we previously would have used `{inputValue: variable_name}` for Artifacts and Parameters. In the new version, we specify Artifacts with `{inputPath: variable_name}` and Parameters with `{inputValue: variable_name}`.

When building Pipelines, the new SDK version brings a couple of changes. The first is that, as with components, pipeline parameter definitions must be annotated with their data types. Second, pipelines must be decorated with the `@kfp.dsl.pipeline` decorator. Within the Pipeline decorator we can specify the pipeline name (The ID used for querying ML Metadata for information about your run), description (which is optional), and pipeline_root, which specifies the location in which to store pipeline outputs. The ‘pipeline_root’ parameter is optional in Kubeflow Pipelines as it will use MinIO Artifact Storage if a root is not defined. However, given that Vertex Pipelines will use GCS for Artifact storage, it requires that ‘pipeline_root’ be specified (either within the Pipeline decorator, or when calling the create_run_from_job_spec method of the Python Client).

‍

Kubeflow SDK v2.0 Limitations

‍

In addition to these SDK v2.0 considerations that users must keep in mind when developing Kubeflow Pipelines for Vertex Pipelines, there are some additional constraints given the practicalities of Vertex Pipelines’ implementation.

The first is caching of pipeline component executions. In Kubeflow Pipelines we could specify that a cache of a component execution would expire after a given amount of time; before which, components running with identical configurations would use the cached output of previous executions. In Vertex Pipelines, we can’t specify the time frame after which caches will expire, but we can use the ‘enable_caching’ parameter of the create_run_from_job_spec method of the client to enable/disable the use of caches in Vertex Pipeline executions.

In addition to caching, recursively called components is another feature of Kubeflow Pipelines that Vertex Pipelines does not currently support. The Google documentation on this does use the same language of ‘Currently, Vertex Pipelines does not support..’, which would indicate that this is something they are potentially looking to support in the future.

Another key difference between Kubeflow Pipelines and Vertex Pipelines is the push to use more Google Managed resources such as GCS within your pipelines. For example, in Vertex Pipelines, users can access GCS directly as though it were a mounted volume of storage using Cloud Storage FUSE. By contrast, previously in Kubeflow Pipelines, users interacted with Kubernetes resources such as Persistent Volume Claims (PVCs). Another indicator of this is the host of Google Cloud specific predefined components that have been released to support interaction of pipelines and Google Cloud/ Vertex AI resources.

‍

Conclusion

‍

In summary, Vertex AI Pipelines introduces some nice changes over the previous AI Platform Pipelines implementation that will overall make the experience of developing and running MLOps workflows on GCP a lot easier. The move to make the underlying resources more managed than in the previous solution is a welcome one, simultaneously speeding up and simplifying the process of getting up and running with Pipelines in GCP. It is worth noting that this product is still in a kind of preview phase, however the key tools are already there to start using this product and it certainly is a promising improvement on what came before. For those still unsure of whether to use AI Platform or jump straight in with Vertex Pipelines, I would recommend you give the new kid on the block a chance.

‍