The 8-Year Wait: How ML6 Tackled Grid Congestion at Elia’s “Hack The Grid 2025”

5:33

Executive Summary
Join ML6 on our three-day journey at Elia’s “Hack The Grid 2025.” We confronted a critical challenge: an 8-year wait for new grid hardware, by developing a leaderboard-topping machine learning model that allows operators to safely push existing infrastructure to its absolute limits. Here’s how we did it.

Electrification: A Boon for the Climate, a Challenge for the Grid.

The energy world is in the grip of a perfect storm. Across Europe, Transmission System Operators (TSOs) like Belgium’s Elia are navigating a set of simultaneous challenges that are straining our electrical infrastructure to its limits:

On one hand, a massive wave of electricity demand, growing up to 7% annually to 2030, is underway as society transitions away from fossil fuels to power everything from cars to industrial processes.
On the other, this surge in demand is straining an aging infrastructure, with critical assets like power transformers nearing the end of their operational lifespan.
To top it all off, a severe global supply chain crisis for essential grid components has rendered traditional reinforcement strategies dangerously slow. This crisis has given suppliers immense pricing power, driving up the cost of infrastructure, while simultaneously causing lead times for new large power transformers to skyrocket from a few months to up to 8 years.

The Unsung Heroes: Transformers and Their Thermal Tipping Point

At the heart of this challenge are the unsung workhorses of the electrical grid: power transformers. These devices are fundamental to the efficient transmission of electricity, “stepping up” voltage for long-distance transport to minimize energy losses and “stepping down” voltage at substations for distribution to industries and homes. Substations act as the critical nodes of the power grid, managing voltage, controlling power flow, and protecting the system.

However, the operational limit of a transformer isn’t defined by the electricity it carries, but by the heat it can withstand. As current flows, electrical losses in the windings and core generate immense heat. The single most critical factor determining a transformer’s lifespan is the health of its cellulose-based paper insulation. Deep within the transformer’s windings, there will be a single point that experiences the highest temperature: the “hotspot”.

This is where the real drama unfolds. The degradation of the insulation at this hotspot is an irreversible chemical process that follows an exponential relationship with temperature, famously known as the “6-degree rule”. For every 6°C increase in hotspot temperature, the rate of the insulation’s aging effectively doubles. Operating a transformer just slightly hotter than its design intended can slash its lifespan dramatically, turning one hour of emergency overload into the equivalent of days of normal operational life. This makes the accurate prediction and control of the hotspot temperature the single most important task in managing transformer health.

Enter the Stage: Elia Hack the Grid 2025

It’s within this context that Elia, one of Europe’s top five TSOs, hosted its “Hack the Grid” Hackathon 2025 in Brussels. The challenge wasn’t just an academic exercise; it was a direct call for innovative solutions to a real-world crisis:

“How can we reinvent substation management to safely exceed limits, handle more load, and delay costly upgrades when building new is no longer a timely option?”

To tackle this challenge, Elia tasked the participating teams with the following objectives:

Develop a predictive model for transformer hotspot temperatures based on historical load data, transformer parameters, and environmental conditions.
Leverage insights from the temperature model to propose a forward-thinking strategy to handle increasing loads on substations, prioritize transformer reinforcements, and extend their operational life without immediate costly infrastructure upgrades.

This set the stage perfectly for the ML6 Energy Transition Domain team. The more accurate the model developed in Part 1, the more confidently and less conservatively the strategy in Part 2 could be executed.

Part 1: The Model — From a “White-Box” Problem to a Data-Driven Revolution

For decades, TSOs have relied on “white-box” physical models, like those codified in IEC standards, to estimate this critical hotspot temperature. These models use a set of thermodynamic equations and nominal design parameters to calculate heat transfer. Elia themselves provided a baseline model for the hackathon based on this principle.

The problem? It’s widely acknowledged in the industry that these physical models are often “not very accurate”. They rely on generalized assumptions that don’t capture the unique characteristics of an individual, aging transformer, especially during the dynamic and rapid load changes that are becoming the new normal. This inaccuracy forces grid operators to apply large, conservative safety margins, effectively underutilizing their multi-million-euro assets simply to mitigate the risk posed by their own models.

This is where our team got to work. Armed with nearly two years of historical data for 17 different transformers, our mission was to replace the rigid physical model with a dynamic, data-driven one.

The Data: A Digital Fingerprint for Every Transformer

Elia provided a rich dataset for each transformer, which formed the foundation of our model. This included:

Time-series Data: Quarter-hourly readings of the transformer’s load and the resulting hotspot temperature, along with hourly outside air temperature readings. This provided a granular view of how each transformer responded to different operational and environmental conditions.
Static Properties: A JSON file detailed the unique characteristics of each transformer, including its nominal load capacity, cooling type (e.g., ONAF), age, and the specific parameters from its factory heat-run test.

The first step was to combine these disparate sources into a single, unified dataset. But we didn’t stop there. Based on ML6’s experience, we know that the performance of a machine learning model often hinges more on the quality and richness of the data than on the complexity of the model itself. So, we focused heavily on feature engineering.

We enriched the data by creating new, insightful features for the model to learn from. This included calculating the transformer’s age in months and converting categorical data like the cooling type into a numerical format. Most importantly, we engineered lagged and rolling average features. A transformer has significant thermal inertia; its current temperature is a result of not just the current load, but the load and ambient temperatures over the past several hours or even days. By creating features like “average load over the last 3 hours” or “temperature 12 hours ago,” we gave our model a sense of memory, allowing it to understand trends and thermal momentum, not just a single snapshot in time.

The Model: Simplicity, Speed, and a Clear Winner

With the enriched dataset ready, we experimented with gradient-boosted tree models like XGBoost and LightGBM. These models are industry workhorses, known for their high performance and efficiency, making them perfect for a time-sensitive hackathon environment.

The experiments included:

One Model Per Transformer: Training a unique model for each of the 17 transformers to see if it could capture its individual thermal signature.
One Model For All: Training a single, global model on the data from all transformers, allowing it to learn from a much larger and more diverse dataset.
The Lagged Model: Our primary hypothesis, a global model trained on the dataset enriched with our lagged and rolling average features.

The results spoke for themselves. While all models significantly outperformed the baseline physical model, the XGBoost model using our engineered lagged features was the undisputed champion.

Transformer hotspot temperature prediction ML model development results.

The lagged features allowed the model to understand the crucial context of thermal inertia, reducing the Mean Absolute Error (MAE) by over 65% compared to the baseline. This final model was our submission to the jury, and we are proud to announce that it secured first place in Part 1 of the hackathon, outperforming more complex models proposed by the competition through the power of thorough data modelling and feature engineering.

With this accurate, winning model in hand, we had the intelligence required for the next phase: building a strategy to use these insights to actively manage the grid.

Part 2: The Strategy — From Physical Limits to Digital Levers

With the 8-year lead time for new transformers as a stark reality, our strategy for Part 2 of the hackathon pivoted away from simply waiting for new hardware. We had to find solutions that worked with the existing infrastructure. Our goal was clear: develop a forward-thinking approach to leverage the grid we have, pushing it to its limits safely and intelligently.

Exploring Options: What Can and Can’t Be Done

Our initial brainstorming sessions focused on physical modifications to the substations. We noticed from the hackathon data that transformers with ONAF (Oil Natural Air Forced) cooling, which use fans to actively move air, were more performant than those with ONAN (Oil Natural Air Natural) cooling, which rely on passive convection. A simple, short-term solution would be to upgrade existing transformers to ONAF where possible.

However, more ambitious physical changes quickly ran into real-world constraints, as confirmed by Elia’s experts on-site:

Removing Walls: While removing the walls surrounding transformers would improve natural wind cooling, these structures are essential for noise reduction and even protection against physical attacks.
Active Noise Cancelling: This was suggested as an alternative to the sound-dampening walls, but we learned that this technology had been tried in the past and proven ineffective for this use case.
Offloading to Reserve Transformers: Transformers often have a backup for redundancy. While it’s tempting to offload the main transformer during peaks by shifting the load to the reserve, this is already partially done and cannot be extended further due to strict regulatory redundancy constraints.
Active Cooling: Using power to actively cool a transformer is a self-defeating loop, as the power would come from the transformer itself, increasing its load and thus its temperature. However, the idea of pre-cooling showed promise. Using a load forecast, we could cool the transformer before a predicted load spike, giving it a thermal buffer to handle the peak.

The Pivot: If You Can’t Change the Hardware, Change the Strategy

These hurdles made it clear that a purely physical approach was a dead end. Instead, we shifted our focus to a smarter, data-driven approach, one that empowers operators to devise their own strategies. Rather than prescribing a single solution, we built a system that provides deep insight into transformer behavior, grid stress, and future scenarios. The core idea is to reduce transformer loads during peak hours, since even a slight temperature increase can dramatically shorten a transformer’s lifespan. From this emerged our guiding principle: enabling intelligent peak shaving through informed decision-making.

To achieve this, we developed a strategic tool called Optimus. Built rapidly using a UI development platform called Lovable (note: the screenshots below contain dummy data), Optimus classifies transformers into three risk categories: low, medium, and high priority based on the load forecast and our accurate temperature model from Part 1. This allows an operator to instantly see where the biggest problems lie and take strategic action.

Optimus was built during the hackathon under tight time constraints, so the data shown in this post is purely illustrative. The goal isn’t to focus on the exact numbers, but rather to showcase the kind of scenario modelling Optimus makes possible, and how it empowers operators to make smarter, more informed decisions about their grid.

Let’s walk through an operator’s journey with Optimus. In the main dashboard, the operator sees a “Priority Assessment” list and a map. The transformer at “Schaerbeek Centraal” is flagged as a high priority. It has only 7.5 years of lifespan left, and with a new transformer taking 8 years to arrive and not being ordered yet, its replacement will not be on time.

Figure 1: Optimus landing page showing Belgium’s transformers and their priority assessment

Drilling down into Schaerbeek Centraal, the operator sees the “Long-Term Assessment.” The forecast shows that future peak loads will exceed the transformer’s capacity, resulting in 5.6 MW of “Uncovered Load.”

Figure 2: Optimus substation analaysis dashboard for the Schaerbeek Centraal substation before operational or strategic interventions.

The operator’s first lever is Temperature Adjustment. By increasing the maximum allowable temperature from the standard 98°C to 110°C, they can cover a much larger part of the forecasted load, saving 64% of disconnections. This is possible only because our model from Part 1 accurately maps this new temperature limit to the allowable load. But this action has a significant trade-off: the transformer’s lifespan is now reduced by 1.1 years, dropping to just 6.4 years.

1_1-ZhsGZpf-2ihGCYLnUf2g — Figure 3: Optimus substation analysis dashboard for the Schaerbeek Centraal substation after increasing the transformer maximum allowable temperature.

This is where the second lever, **Battery Support**, comes in. By utilizing battery storage during peak loads, the operator can offload the transformer. In our simulation, adding 1.5 MW of battery capacity extends the transformer’s lifespan back to 7.0 years and increases the number of saved disconnections to 80%.

If the operator desires the end of life prediction to meet the transformer lead time, the maximum temperature can be further decreased to 100°C, decreasing the number of saved disconnections back to 32%, but making a new transformer available for purchase.

Balancing the number of saved disconnections versus the costs of battery deployment and transformer lifetime is ultimately a decision that needs to be made by the operator. Optimus serves to empower operators to simulate trade-offs and explore strategic actions, enabling data-driven decisions that balance grid resilience, transformer lifespan, and investment cost. Finally, the operator can generate an in-depth report of their actions and simulations. This keeps all stakeholders in the loop and allows for expert iteration on the best path forward, ensuring the grid can handle future demand by leveraging its existing infrastructure to the absolute fullest.

The Future is Decentralized: Overcoming the Battery Bottleneck

Of course, installing large battery parks isn’t a simple fix. They are expensive, and finding private investors can be difficult, especially since Elia, as a TSO, is not allowed to build them directly. Our solution looks to the future: **aggregating consumer home batteries**. As home batteries become more common, they represent a massive, decentralized energy resource. Through aggregator parties, consumers could agree to let their batteries be used to support the grid when needed, creating a flexible and scalable solution. Battery support is just one of many strategic actions that could be integrated into Optimus.

Wrapping Up From Intelligence to Action: A Blueprint for the Modern TSO

The journey through Elia’s “Hack the Grid” was a microcosm of the immense challenge facing the entire energy sector. We began with a grid caught in a perfect storm: surging demand from electrification, an aging infrastructure, and a supply chain crisis that makes building new assets a multi-year waiting game. The central question was stark: how do we do more with what we already have?

Our two-part solution provides a comprehensive answer. It starts with a foundation of pure intelligence. By developing a machine learning model that won first place, we replaced inaccurate, conservative physical estimates with a precise, data-driven understanding of each transformer’s true thermal limits. This model became our crystal ball, giving us the confidence to know exactly how far we could safely push the existing hardware.

But insight without action is merely an academic exercise. That’s where our strategic tool, Optimus, came into play. It transforms our predictive model’s insights into a dynamic playbook for grid operators. By simulating the trade-offs between increasing a transformer’s temperature to meet demand and leveraging innovative solutions like aggregated battery support to preserve its lifespan, we provided a tangible way to navigate the future. Our approach doesn’t just manage the problem; it optimizes it, turning a crisis of scarcity into an opportunity for intelligence.

Ultimately, this project demonstrates a fundamental paradigm shift required for the energy transition. The future of a stable, affordable, and sustainable grid won’t be secured by concrete and steel alone. It will be built on a digital layer of data, analytics, and intelligent control. By proving we can unlock significant capacity from the grid we already have, we offer a pathway to bridge the critical years until new infrastructure comes online. For ML6, this wasn’t just about winning a hackathon; it was about proving that the right data-driven strategy can help tame the perfect storm and power a cleaner future for everyone.

*Source article available on our Medium blog: The 8-Year Wait – How ML6 Tackled Grid Congestion at Elias Hack the Grid 2025*

The 8-Year Wait: How ML6 Tackled Grid Congestion at Elia’s “Hack The Grid 2025”

Casper Kanaar

Electrification: A Boon for the Climate, a Challenge for the Grid.

The Unsung Heroes: Transformers and Their Thermal Tipping Point

Enter the Stage: Elia Hack the Grid 2025

Part 1: The Model — From a “White-Box” Problem to a Data-Driven Revolution

The Data: A Digital Fingerprint for Every Transformer

The Model: Simplicity, Speed, and a Clear Winner

Part 2: The Strategy — From Physical Limits to Digital Levers

Exploring Options: What Can and Can’t Be Done

The Pivot: If You Can’t Change the Hardware, Change the Strategy

The Future is Decentralized: Overcoming the Battery Bottleneck

Wrapping Up From Intelligence to Action: A Blueprint for the Modern TSO

The answers you've been looking for

Frequently asked questions

You might also like

deCYPher: Faster, Cheaper, and Smarter R&D for a Sustainable Future

Multi-Agent AI Systems: Where They Shine and How They Work Together

AI Robotics: A Field Report on Imitation Learning with LeRobot

Electrification: A Boon for the Climate, a Challenge for the Grid.

The Unsung Heroes: Transformers and Their Thermal Tipping Point

Enter the Stage: Elia Hack the Grid 2025

Part 1: The Model — From a “White-Box” Problem to a Data-Driven Revolution

The Data: A Digital Fingerprint for Every Transformer

The Model: Simplicity, Speed, and a Clear Winner

Part 2: The Strategy — From Physical Limits to Digital Levers

Exploring Options: What Can and Can’t Be Done

The Pivot: If You Can’t Change the Hardware, Change the Strategy

The Future is Decentralized: Overcoming the Battery Bottleneck

Wrapping Up From Intelligence to Action: A Blueprint for the Modern TSO

The answers you've been looking for

Frequently asked questions

1.Why is grid congestion such a big problem in Europe’s energy transition?

2.How does AI improve transformer hotspot prediction compared to traditional models?

3.What practical solutions did ML6 propose at Hack the Grid 2025?

4.Q4: How can AI help grid operators during the 8-year wait for new transformers?

5.Is ML6’s solution scalable beyond Elia and Belgium?

6.What role do batteries play in solving grid congestion?

You might also like

deCYPher: Faster, Cheaper, and Smarter R&D for a Sustainable Future

Multi-Agent AI Systems: Where They Shine and How They Work Together

AI Robotics: A Field Report on Imitation Learning with LeRobot