Artificial intelligence often feels weightless: models live in “the cloud”, responses arrive instantly, and intelligence appears to scale with little friction. In reality, AI is firmly grounded by a very physical constraint: energy.
As models grow larger and AI usage expands into everyday products, electricity consumption is emerging as one of the most serious limits on how fast AI can roll out. This isn’t a distant concern, it’s already shaping where AI systems can be built, how they’re trained, and how often they can be used.
This article looks at where AI’s energy goes today, why it’s becoming a bottleneck, and how the industry might work around it in the near term, before looking briefly at longer-term possibilities.
Where the Energy Goes
Training: the upfront energy cliff
Training large AI models is currently the single biggest energy sink. Modern foundation models are trained on thousands of GPUs running continuously for weeks or months. Each GPU may draw several hundred watts; multiplied across thousands of machines, the result is energy usage comparable to small towns.
The challenge isn’t just scale, it’s inefficiency. Much of the energy consumed during training doesn’t go into “thinking” so much as moving data between memory and processors and then removing the resulting heat. Large models amplify this problem because their parameters no longer fit neatly into fast on-chip memory.
While training is a one-off cost per model generation, the trend toward ever-larger models means this cliff keeps getting steeper.
Inference: death by a billion prompts
If training is a cliff, inference is erosion.
Each individual AI query uses relatively little energy, but at scale it adds up quickly. As generative AI becomes embedded in search, productivity tools, customer support, and devices, inference workloads run continuously. Unlike training, inference never really finishes.
As usage scales, inference energy is expected to rival, or even exceed, training energy. This is especially true for large, general-purpose models that are used for tasks where a much smaller model would suffice.
Data Centres: Power and Heat
AI workloads live inside data centres, and AI has fundamentally changed their design constraints.
AI servers consume far more power than traditional servers, often 5–10× more per rack. This creates three interrelated problems:
- Power delivery – local grids may simply not be able to supply enough electricity.
- Cooling – almost all consumed energy becomes heat, which must be removed.
- Density limits – packing more compute into the same space increases both issues.
In some regions, data centre expansion has already been paused because grids cannot cope with demand. Elsewhere, operators are keeping fossil-fuel power plants online purely to support data centre growth, undermining climate targets.
Near-Term Ways Out of the Bottleneck
1. Better data centre efficiency
Incremental improvements matter at scale. Liquid cooling, immersion cooling, and improved airflow designs can significantly reduce the energy spent on cooling. Location choice also matters: cooler climates and regions with abundant clean energy reduce both heat and emissions.
These changes don’t eliminate AI’s energy appetite, but they stretch each kilowatt further.
2. More efficient hardware
Specialised AI chips are steadily improving performance per watt. Newer GPUs, TPUs, and custom accelerators deliver more computation for the same energy, while emerging designs aim to reduce the costly movement of data between memory and processors.
This is one of the most reliable near-term levers: each hardware generation quietly makes AI cheaper to run, even as models grow.
3. Smarter software and smaller models
Not every task needs a massive model.
Techniques such as quantisation, pruning, and model distillation can reduce energy usage dramatically with minimal loss in quality. In many real-world applications, replacing a single giant model with multiple smaller, task-specific models can cut energy use by orders of magnitude.
This is arguably the biggest low-hanging fruit: efficiency gains here reduce both training and inference costs immediately.
4. Energy-aware scheduling
AI workloads are unusually flexible. Training jobs don’t need to run at a specific time or place.
Cloud providers are beginning to schedule AI tasks based on where renewable energy is currently abundant or where grid carbon intensity is lowest. Over time, this could allow AI workloads to “follow the sun and wind”, smoothing demand and reducing emissions without slowing progress.
The Bigger Picture
Even with aggressive efficiency gains, AI’s total energy use is likely to keep rising in the short term. The key question is whether efficiency improves faster than demand grows.
If it does, AI can scale sustainably. If it doesn’t, energy availability, not innovation, becomes the limiting factor.
Looking Forward
Beyond today’s tools, several longer-term developments could reshape the landscape:
- Photonic computing, using light instead of electricity, could slash energy use for certain AI operations.
- Quantum computing may eventually solve specific AI problems far more efficiently than classical hardware.
- Fusion energy, if commercialised, could remove energy scarcity from the equation altogether.
- Edge AI, running models closer to users on efficient local hardware, could reduce reliance on massive centralised data centres.
None of these are silver bullets, but together they hint at a future where intelligence scales without overwhelming the planet that hosts it.
AI may feel abstract and digital, but its limits are grounded in physics, infrastructure, and energy. How we navigate those constraints will shape not just the future of AI, but the future of computing itself.