The blog post discusses the performance of Automated Machine Learning (AutoML) tools that aim to automate the tasks of applying machine learning to real-world problems without expert knowledge. The blog argues that the tools available today do not deliver upon these promises and uses two different approaches to answer the question "What can you expect from Automated Machine Learning?" The first approach is zooming in on the technical aspects of AutoML, and the second approach is zooming out to show the bigger picture. The technical aspects show that there is no consensus on the exact components that should be automated, and different AutoML tools offer a wide variety of features. The bigger picture shows that AutoML is just a single step in a bigger problem-solving story. Without expertise, humans in this story risk making mistakes at the interfaces. Therefore, AutoML users need at least some expert knowledge to work with AutoML effectively.
Lots of parties claim they offer the AutoML solution that you are looking for. The recent AutoML Benchmark tries to give an objective view on the performance of some of these tools.
But what is AutoML? Most explanations include
something technical, like
… automating the tasks of applying machine learning to real-world problems -wikipedia
as well as a people aspect:
… off-the-shelf machine learning methods that can be used easily and without expert knowledge” — automl.org
Taken at face value, these are huge claims. In this blogpost we will argue that the tools that are available today don’t deliver upon these promises. Focusing on AutoML for structured (tabular) data in particular, we try to answer the question: “What can you expect from Automated Machine Learning?”
A romanticised version of what people expect from AutoML might look somewhat like this:
This blogpost takes a good look at the schematic above, using 2 different approaches:
“What exactly should AutoML automate?”
There is no straightforward answer to this question that all frameworks / developers / users agree upon. This lack of clarity has allowed different toolmakers to focus on the automation of different subsets of tasks in ML projects. The result is that available tools often have overlapping, but different feature sets. For an end user, who wants to “automate ML tasks”, it can be very hard to compare tools.
Below are 4 examples that illustrate this. (This is not an exhaustive list — feel free to do this exercise for other tools yourself.)
Auto-sklearn automates the bare minimum. Note the need for manual cleaning of the dataset and feature preprocessing. (In these sketches, the coloured components are the ones covered by the tool in question):
TPOT (another classic) in addition includes feature preprocessing, feature construction and -selection:
The creators of MLJAR (and similar tools) aim somewhat bigger. These open source packages include features for e.g. explainability, ensembling, documentation generation, experiment tracking and model validation. This means that users are not only creating a ML model — they’re also capturing lots of useful metadata.
In 2022, Google Cloud really pushed the development and adoption of “Vertex AI AutoML Tabular”. It does some data cleaning (in contrast to the previous tools) — meaning you could give it your raw dataset if you wanted to. Statistics generation & data drift detection is implemented via TFX. Trained models can be deployed as an endpoint: all these steps integrate nicely using Vertex AI pipelines. GCP also offers post-deployment explainability rather than explainability during model development / selection. No real documentation generation to be seen though…
Having zoomed in on the internals of a couple of popular AutoML tools, it’s becoming pretty clear:
There is no consensus on the exact components that should be automated. Different AutoML tools offer a wide variety of features.
(Auto)ML is just a single step in a bigger problem-solving story. Without expertise, the humans in this story risk making mistakes at the interfaces:
AT THE INPUT: Getting data into any ML system requires some decision making.
Some things just can not be automated. Before a dataset can be handed to an AutoML system (or a bunch of ML engineers for that matter), someone with knowledge of the data needs to think hard & deep:
Pretty sure these aren’t the only issues — but you get the point.
AT THE OUTPUT: The interpretation of the results of ML training jobs ain’t easy.
Once you’ve trained a model — Yay! — you’ll probably want to have a look at its specs before you deploy it. How else would you know if it’s good enough? Does it need more training? More data? A different problem formulation? In order to answer these questions, you’ll probably need to dig a bit deeper into the results:
We’re not just fans of throwing around abbreviations — these are all examples of questions that pop up during ML projects. Even if you can automate some or all technical parts of the model creation it does not mean your model is ready to be deployed once it is trained.
By zooming out we’ve come to realise that:
It is unrealistic to “let non-experts deliver end-2-end ML projects” because in order for humans to interact with ML systems, expertise concerning the input data and the model’s evaluation is definitely required.
AutoML is yet another tool: a component in an iterative framework that allows you to search for the best possible data to make useful predictions, without spending too much time on model development.
Tools need to be used in a certain way. If you have the right expectations about what a tool can do, you are more likely to use the right tool for the right job. Don’t expect any single tool to magically solve your problems.