The Environmental Impact of Generative AI: Why It’s Hard to Measure

Written by Iris Luden | May 8, 2026 1:15:18 PM

Executive Summary

Generative AI is often described as “virtual.” In reality, it runs on a deeply physical infrastructure of data centers, servers, and cooling systems that consume significant resources. As adoption accelerates, so does its environmental footprint — yet measuring that impact remains highly complex. virtual

The environmental impact of artificial intelligence goes beyond a single metric. It spans electricity use, carbon emissions, water consumption and hardware production, all influenced by model size, infrastructure, energy source, and location.

While training large AI models is energy-intensive, the real driver of impact is usage at scale. GPT-4’s training emissions are estimated at ~5,000 tons of CO₂, yet ongoing inference can quickly match or exceed this. Generative AI tasks can also consume up to 1,000× more energy than traditional AI workloads.

Despite growing attention, transparency remains limited, making impact difficult to measure and compare.

The key takeaway: GenAI’s impact is real but largely invisible — and improving how we measure and optimize it is the first step toward more sustainable AI.

The Environmental Impact of AI: 10 things we know about the environmental impact of GenAI

There is so much talk about the ‘cloud’ nowadays. Downloading from the cloud, uploading to the cloud, saving to the cloud, looking at things on the cloud. However, what many people do not realise is that the cloud is very much a real and tangible place in this world. Yet, despite its name, the cloud is not abstract. It is a physical network of data centers, fiber-optic cables, servers, and cooling systems that are required to keep the ‘cloud’ turned ON.

This was also the case before GenAI came into play. Storing data in the cloud, streaming videos on Netflix or YouTube, running Google searches, sending emails and joining video calls… Even though these actions seem to magically appear on your laptop screen, all these rely on physical resources. The internet doesn’t come for free. As Marietje Schaake explains in the book The Tech Coup, our digital world is not weightless. It is built from physical materials that require resources and continuous energy to function.

Now, GenAI adds another layer to cloud consumption. The past year, this has led to a surge in media coverage, often citing conflicting numbers about its impact. Headlines range from negligible impact to claims such as “one prompt uses 500 ml of water.” The issue is: we talk about AI’s environmental impact, but we don’t measure it in a consistent or comparable way. The cloud is… well… a cloudy business. In this blog series, we break down what AI actually consumes, and why.

What drives the environmental impact of generative AI?

Before diving into why it is difficult to measure, it’s important to understand what actually contributes to the environmental impact of AI systems.

The environmental impact of AI models is often expressed in terms of water usage, carbon footprint or energy in terms of kilowatt-hours. However, environmental impact is a broad concept.

Think back to the physical foundations: cutting down trees, extracting raw materials, and transporting them to build data centres. Then come construction, hardware installation, and ongoing operations. This is only the start of the lifecycle. Model training and inference require computation, code efficiency affects energy use, and hardware production carries its own footprint. In other words, the environmental impact of AI is the sum of everything required to build, run, and use it.

We grouped this impact into six key dimensions: water usage, carbon emissions, (other) greenhouse gases, heat emissions (e.g. local temperature effects), (rare) mineral use, and electricity use. Each dimension reflects a different part of the system, meaning no single metric tells the full story.

Known knowns and known unknowns

Most providers of large Generative AI models don’t actively report on the emissions of their models, nor a detailed per-model energy disclosure for the models that they’ve developed or hosted. This lack of transparency is one of the main reasons why public estimates vary so widely.

Mistral was one of the first and only to publish a truly comprehensive disclosure on the topic, releasing a full lifecycle analysis covering training, inference, hardware, and infrastructure using standardised methodologies. Google published research on the carbon footprint of training large models and more recently shared estimates on inference energy use. Microsoft and Amazon (AWS) mostly report at the infrastructure level without model-level disclosures, whilst OpenAI, Anthropic (Claude), and others have not published detailed model-level environmental metrics beyond high-level statements.

So, for the specific model that you might be using, we cannot know exactly what the environmental impact is. Moreover, the impact is location, data centre, model type, code efficiency, input/output size specific, and so much more. So even if you want a precise number, we often simply don’t know. So what do we know?

What do we know? Ten “knowns” about the environmental impact of AI.

Model size and carbon emission are closely related (but that is not the full story).
There is a direct and well-established correlation between the size of an AI model and its carbon emissions. As model size and training compute have grown exponentially over the past decade, so has the energy required to train and operate them (Patterson et al., 2021).

Stanford AI Index Report 2025

Larger models require more compute during both training and inference, increasing total energy demand. But, as we’ll see in the following knowns, model size is definitely not the full story.

EpochAI

2. Adoption and inference costs are eventually orders of magnitudes higher than pre-training costs.

For most models, we have a rough estimation of how much energy it cost to train them. For instance, the figure below shows that GPT4 is estimated to have emitted over 5000 tons of CO2 (Stanford AI Index Report 2025). Even though this is a lot (equivalent to about 270 years of a single American human life), this number is quickly marginalized by the user adoption of GPT4: GPT4 is estimated to have over 900 million daily users. Suppose that each user has one small conversation with GPT4 a week, this comes down to a weekly footprint of ((900*10⁶ * 3.83)/10⁶) = 3447 tons of CO2², almost half of what the pre-training emissions were.

*So: don’t focus so much on the pre-training costs, rather it’s the scale of user adoption and the inference costs that truly make a difference in the long run.

Stanford AI Index Report 2025

3. Embodied emissions as a hidden cost

The impact isn’t just about electricity. It starts with the machines that consume it. Hardware is relevant in two ways: firstly, we can consider the “embodied emissions”, the total footprint of creating the hardware itself. This can go as far as the mining of raw materials, manufacturing, transportation, and eventual disposal of the GPUs, CPUs, and entire data center infrastructure. However, this extremely hard to trace back and measure, often overlooked, and thus often excluded from model-level reporting (Green Software Foundation).

4. Hardware efficiency largely determines AI model emissions

Hardware is becoming more efficient, meaning newer generations of chips can perform the same computations using less energy.

The energy efficiency of the hardware used for AI models also keeps increasing. Stanford AI Index Report 2025.

5. Data centres design and hardware efficiency determine water usage.

Just like your laptop heating up when you use it a lot, data centre hardware heats up. Servers are running non-stop, so they need cooling systems to keep everything from overheating. This is mostly done with water cooling systems. Depending on how the cooling system is designed, this can really add up (Google Environmental Report; Microsoft Sustainability Report). And again, the more efficient the hardware, the less cooling the server requires.

6. Energy source matters more than model size.

Patterson, David, et al. (2021) argue that the carbon intensity of electricity used to power the deployment of models matters more for the CO2 emitted than efficiency gained by optimising AI model architectures.

For example, a model hosted in a data center powered by renewable (e.g. Sweden) or nuclear energy (e.g. France) will result in significantly lower emissions than the same model running on a fossil-fuel-based grid. That said, total energy consumption still matters, as energy used for AI competes with other societal electricity demands (IEA Global Energy Review).

7. The geographical data center location deploying the model is an important factor to take into account when looking at impact.

A data center in a region powered by nuclear or renewable energy (like Sweden or France) will have a much lower carbon intensity than one in a region reliant on fossil fuels, even if they consume the same amount of total energy in terms of kilowatt hours.

Likewise, impact of water usage is also highly location-dependent. In water-scarce regions, data center cooling can place additional stress on local resources, whereas in water-abundant regions the relative impact is lower.

Cooling can also affect ecosystems, for example when discharged water changes temperatures in nearby lakes or rivers. This is called thermal pollution.

At the same time, there are examples where excess heat from data centers is reused for district heating, reducing overall system-level impact.

You can also imagine why geographically cooler locations are therefore popular places for data centres: the cooler the location already is, the less the water cooling systems need to do to prevent hardware overheating.

World’s hottest data centers

8. Energy consumption varies between AI tasks.

Generative AI is in a different league of energy consumption. compared to more traditional AI tasks like classification or time-series prediction, generative models can consume up to 1,000 times more energy. This difference is not marginal. It fundamentally changes the scale at which AI systems consume resources.

To make this more concrete: In the figure below, we find that simple classification tasks, such as sentiment analysis, can be tens to hundreds of times cheaper than generating text. The same holds for images, where classification is far less compute-intensive than generation.

Simple text classification tasks like sentiment analysis can be tens to hundreds of times cheaper in terms of compute compared to text generation. Likewise, image classification is about 100 times cheaper than image generation (Luccioni, S., Jernite, Y., & Strubell, E. (2024, June)).

Model emissions per AI task (Luccioni, S., Jernite, Y., & Strubell, E. (2024, June))

The reason is quite obvious. Generative AI models are generally much larger, and generate outputs step-by-step (e.g., token-by-token or pixel-by-pixel. This requires many repeated forward passes instead of a single pass like in classifiers. They also produce much larger and more complex outputs (long texts, high-resolution images), which means substantially more computation.

From: EpochAI

9. Input vs. Output.

In the world of Large Language Models (LLMs), not all processing is equal. Processing the user’s input (the prompt) is relatively efficient because the model can analyse the text in parallel. However, generating the output is a sequential, step-by-step process. This requires less efficient GPU utilisation, making output tokens significantly more expensive in terms of energy, often costing 2 to 6 times more than input tokens. This leads us to the next and final dimensions: model pricing.

Typical Cost Multipliers for Reasoning

10. Model price and energy consumption are closely related.

Recently, I saw an article stating “saying please and thank you to ChatGPT costs OpenAI millions of dollars”. Perhaps the statement is exaggerated, but it does have a truth to it: more tokens processed means more computation, and therefore more costs… and at the same time, also more energy consumption. Model providers generally base the pricing on compute and infrastructure costs (this is also why output tokens are so much more expensive than input tokens). This makes costs another rough estimation for environmental impact.

Instagram post by “commonsearth”.

What can we actually do? Making the invisible… visible.

Understanding impact drivers is useful, but doesn’t bring us far in daily life.

At ML6, we crafted with a long list of all of the potential actions we can take to reduce the impact of our existing solutions based on our investigations.

From data centre locations, cloud infrastructure, AI model choice, data storage architecture, code efficiencies, alerting and monitoring tools, and much more. However, we hit a wall. Many of the measures we can take are very high-effort, and it’s difficult to assess which efforts are worthwhile.

Therefore we also investigated the tools that exist out there to measure the environmental impact of our solutions. These are packages to measure the impact of the models that you host, develop and deploy yourself. Examples include tools like CodeCarbon, which estimate emissions during training or experimentation workflows. This will be the topic of our next blogpost.

Understanding the main impact factors, from model size and type to data center location and the very nature of token processing, is the first step toward building and using AI more responsibly. There is a visibility gap: for many real-world GenAI applications, the largest share of impact remains outside direct measurement.

Therefore, the first step is to make the invisible more visible and create awareness. When we see a running tap, we close it. But we cannot close a running tap that we don’t see. The first challenge is actually about improving how we transparently measure, explain, and compare it across systems.

Stick around to see our next blogpost: we’ll move from theory to practice and share some numbers and experiences of our experiments with different toolings and measures.

Summary

In this blogpost, we’ve explained 10 reasons why GenAI models have environmental impact, and why it’s so hard to precisely measure. In sum:

Model size and carbon emission are closely related (but not the full story).
Pre-training costs are quickly overshadowed by magnitude of user adoption and inference costs.
Embodied emissions are a hidden cost and almost impossible to measure.
Hardware efficiency largely determines AI model GHG emissions.
Data centres design and hardware efficiency determine water usage.
Energy sources of data centres matter more than model size for GHG emissions.
The geographical location of the data center deploying the model is an important factor.
Energy consumption varies between AI tasks.
Output costs are tremendously more costly than input costs. Both in terms of price as well as in terms of energy consumption.
Model price and energy consumption are closely related.

Over the past year, we at ML6 have investigated the environmental impact of AI. We have gained a deeper understanding why AI models consume energy and other resources, we investigated tooling to measure impact, and collected and experimented with potential actions to take to reduce environmental impact. This is part 1 of a blogpost series, where define the core problem. Stick around for blogpost parts 2 and 3.

*Estimated carbon emissions for a single chat with GPT4 is taken from EcoLogits. As you will hopefully understand by the end of this blogpost, still EcoLogits is using rough estimate and the error margin might be large. Notice however that even the CO2 emission is overestimated by 10x, the inference costs outweigh the pre-training costs in weeks time.

View full post