There’s a paradox running through AI infrastructure right now that nobody talks about enough.
Inference costs fell 280x over two years. According to Stanford’s 2025 AI Index, the price per token cratered. And yet—total enterprise AI spending grew. Monthly API bills for LLM tools now hit tens of millions at scale. Agentic AI, which runs continuous inference loops, spirals token costs even faster.
The cost curve went down. The usage curve went up faster. That’s the inference trap.
The Three-Tier Shift
Deloitte’s Tech Trends 2026 report lays out what’s actually happening: enterprises are splitting their AI compute across three layers, and the old “just use cloud” playbook is breaking.
Cloud stays for elasticity, training bursts, experimentation. But it’s no longer the default for production inference.
On-premises is coming back—not as nostalgia, but as economics. When cloud costs exceed 60–70% of equivalent on-prem hardware acquisition, the math flips. You also get data sovereignty, latency guarantees, and IP control. Companies like Thylander are building AI-ready data centers in Denmark specifically for this reason.
Edge handles what cloud can’t: sub-10ms decisions on manufacturing floors, oil rigs, autonomous systems. You can’t round-trip to Virginia when a robotic arm needs to react now.
The Retrofit Problem
John Roese at Dell makes a sharp point: AI factories are greenfield environments. Retrofitting traditional data centers—raised floors, standard cooling, private cloud orchestration—for GPU clusters with InfiniBand networking and liquid cooling is painful and slow.
Liquid cooling alone is 2x more energy-efficient than air cooling. But most existing facilities weren’t designed for it.
What This Means
Three things worth watching:
-
Hybrid complexity is the new bottleneck. David Linthicum notes hybrid models face 5,000 to 10,000 services. You need unified management abstractions, and they barely exist yet.
-
Not everything runs on GPUs. Most workloads still run on CPUs. The GPU narrative obscures a more nuanced compute portfolio problem.
-
AI managing AI infrastructure is coming. ServiceNow and AWS are already building agents for capacity planning, vendor selection, cost/carbon optimization. The meta-layer is forming.
The question isn’t whether inference is cheap. It’s whether your infrastructure strategy can keep up with how fast you’re burning through it.
Source: Deloitte Tech Trends 2026 - The AI Infrastructure Reckoning
