January 3, 2023

What to Expect from Automated Machine Learning: Zooming in on Technical Aspects and Zooming Out to the Bigger Picture

Introduction

‍

The blog post discusses the performance of Automated Machine Learning (AutoML) tools that aim to automate the tasks of applying machine learning to real-world problems without expert knowledge. The blog argues that the tools available today do not deliver upon these promises and uses two different approaches to answer the question "What can you expect from Automated Machine Learning?" The first approach is zooming in on the technical aspects of AutoML, and the second approach is zooming out to show the bigger picture. The technical aspects show that there is no consensus on the exact components that should be automated, and different AutoML tools offer a wide variety of features. The bigger picture shows that AutoML is just a single step in a bigger problem-solving story. Without expertise, humans in this story risk making mistakes at the interfaces. Therefore, AutoML users need at least some expert knowledge to work with AutoML effectively.

‍

First things first

Lots of parties claim they offer the AutoML solution that you are looking for. The recent AutoML Benchmark tries to give an objective view on the performance of some of these tools.

But what is AutoML? Most explanations include

something technical, like

‍

… automating the tasks of applying machine learning to real-world problems -wikipedia

‍

as well as a people aspect:

‍

… off-the-shelf machine learning methods that can be used easily and without expert knowledge” — automl.org

‍

Taken at face value, these are huge claims. In this blogpost we will argue that the tools that are available today don’t deliver upon these promises. Focusing on AutoML for structured (tabular) data in particular, we try to answer the question: “What can you expect from Automated Machine Learning?”

‍

A romanticised version of what people expect from AutoML might look somewhat like this:

‍

‍

This blogpost takes a good look at the schematic above, using 2 different approaches:

by zooming in on the technical aspects of AutoML: it’s not always clear which ML tasks this “blue block” should automate in order to convert a dataset into a model.
by zooming out to show the bigger picture: people need at least some expert knowledge to work with AutoML.

‍

1. Zooming in

“What exactly should AutoML automate?”

‍

There is no straightforward answer to this question that all frameworks / developers / users agree upon. This lack of clarity has allowed different toolmakers to focus on the automation of different subsets of tasks in ML projects. The result is that available tools often have overlapping, but different feature sets. For an end user, who wants to “automate ML tasks”, it can be very hard to compare tools.

Below are 4 examples that illustrate this. (This is not an exhaustive list — feel free to do this exercise for other tools yourself.)

Auto-sklearn automates the bare minimum. Note the need for manual cleaning of the dataset and feature preprocessing. (In these sketches, the coloured components are the ones covered by the tool in question):

‍

TPOT (another classic) in addition includes feature preprocessing, feature construction and -selection:

‍

The creators of MLJAR (and similar tools) aim somewhat bigger. These open source packages include features for e.g. explainability, ensembling, documentation generation, experiment tracking and model validation. This means that users are not only creating a ML model — they’re also capturing lots of useful metadata.

‍

In 2022, Google Cloud really pushed the development and adoption of “Vertex AI AutoML Tabular”. It does some data cleaning (in contrast to the previous tools) — meaning you could give it your raw dataset if you wanted to. Statistics generation & data drift detection is implemented via TFX. Trained models can be deployed as an endpoint: all these steps integrate nicely using Vertex AI pipelines. GCP also offers post-deployment explainability rather than explainability during model development / selection. No real documentation generation to be seen though…

‍

‍

Having zoomed in on the internals of a couple of popular AutoML tools, it’s becoming pretty clear:

There is no consensus on the exact components that should be automated. Different AutoML tools offer a wide variety of features.

‍

2. Zooming out

‍

(Auto)ML is just a single step in a bigger problem-solving story. Without expertise, the humans in this story risk making mistakes at the interfaces:

AT THE INPUT: Getting data into any ML system requires some decision making.

‍

*How do you decide which data to feed into your autoML system?*

‍

Some things just can not be automated. Before a dataset can be handed to an AutoML system (or a bunch of ML engineers for that matter), someone with knowledge of the data needs to think hard & deep:

What other datasets can I leverage to solve this problem?
How do I aggregate the data to guarantee that the resulting model produces actionable outputs?
What assumptions does this ML tool make about the data I feed it in?
Is this dataset representative of the environment where the model will be deployed?
How can I avoid data leakage and/or target leakage?
Where are all these NULL values coming from? Is there anything I can do about that?

Pretty sure these aren’t the only issues — but you get the point.

AT THE OUTPUT: The interpretation of the results of ML training jobs ain’t easy.

‍

‍

Once you’ve trained a model — Yay! — you’ll probably want to have a look at its specs before you deploy it. How else would you know if it’s good enough? Does it need more training? More data? A different problem formulation? In order to answer these questions, you’ll probably need to dig a bit deeper into the results:

What do you know about the inner workings of this “best” model produced by the AutoML suite? Does it have 100 parameters? 100M? Is is a GBDT or an ensemble of MLPs? Is it deterministic?
Can you explain how it works to the business people?
Can you export it to ONNX or TFLite? What’s the inference latency like?
What do you think about the ROC AUC curve? Can you explain why the RMSE score should be preferred over the R2 measure in this case?
Can you interpret the plots with SHAP values without confusing correlation & causality?

We’re not just fans of throwing around abbreviations — these are all examples of questions that pop up during ML projects. Even if you can automate some or all technical parts of the model creation it does not mean your model is ready to be deployed once it is trained.

‍

By zooming out we’ve come to realise that:

It is unrealistic to “let non-experts deliver end-2-end ML projects” because in order for humans to interact with ML systems, expertise concerning the input data and the model’s evaluation is definitely required.

‍

3. Conclusion

AutoML is yet another tool: a component in an iterative framework that allows you to search for the best possible data to make useful predictions, without spending too much time on model development.

Tools need to be used in a certain way. If you have the right expectations about what a tool can do, you are more likely to use the right tool for the right job. Don’t expect any single tool to magically solve your problems.

‍

(bonus thought experiment)

AutoML can be costly because of all the hyperparams you’re tuning
ML projects are iterative
Would you like to spend hundreds of dollars for slow training jobs in an excessive grid search using the wrong data? Or would you rather use a efficient, simple baseline model and iterate fast to figure out which features are valuable? Maybe AutoML is more suited for “squeezing out the last few percentage points” rather than experimenting quickly?

‍