In the race to incorporate AI into many industries, GPUs have become one of the most sought-after resources in enterprise computing. They are expensive, hard to find, and increasingly seen as essential to doing business. Yet according to new data from Cast AI, the vast majority of GPU capacity is sitting idle, with average utilization hovering around 5%.
Cast AI offers an AI-driven Kubernetes optimization platform, and in its “2026 State of Kubernetes Optimization Report,” the company analyzed operational data from tens of thousands of clusters across major cloud providers to assess how AI infrastructure is actually used in production environments.
At that 5% level of utilization, organizations are effectively provisioning far more GPU capacity than their workloads consume at any given moment. The report estimates enterprises are running with an average of roughly 20 times the GPU capacity they actively use. Unlike CPU resources, where idle capacity has a relatively low cost, unused GPUs are a significant financial inefficiency, especially as prices continue to rise and supply remains limited.
(Credit: Cast AI)
Laurent Gil, Cast AI co-founder and president, said the 5% utilization figure was unexpectedly low, particularly given the cost and scarcity of GPU resources. While underutilization is common with CPUs, he said he expected organizations to manage GPUs more carefully because of their higher price and strategic importance.
“That number really surprised me,” Gil told AIwire in an interview. “For something as expensive and hard to find as a GPU, I expected the number to be higher.”
The 5% figure reflects baseline conditions before any optimization is applied and includes typical fluctuations in demand, such as lower usage during nights and weekends. But even accounting for those patterns, Gil said the gap is still far larger than expected.
The report also shows a wide distance between current usage and what is achievable. In optimized environments, GPU utilization can reach around 50% on average, a level Gil described as a reasonable benchmark for production systems. Moving from 5% to that range can dramatically change the economics of AI by lowering the cost of running these workloads by several multiples, depending on the level of optimization.
Laurent Gil
Gil said multiple factors are contributing to the low GPU utilization rates. One major aspect is how companies are acquiring and managing GPU capacity. In addition to being expensive and scarce, GPUs often require long-term commitments. That dynamic encourages companies to provision for peak demand or future needs rather than actual usage. Once that capacity is locked in, it is not easily scaled down, even during periods of low demand.
Gil described this as a behavioral response to scarcity. Faced with limited availability and long lead times, organizations are acquiring as much GPU capacity as they can, even if they do not yet have the workloads to fully utilize it.
“It’s expensive, it’s impossible to find, and you really need it,” Gil said. “When these three things come together, what do we do as humans? We buy everything we can find.”
This tendency to hoard GPUs is reinforced by how AI workloads behave in practice. Since inference demand is often variable, with bursts of activity mixed with idle periods, it drives overprovisioning. Also, a lot of companies are not set up to optimize GPU resources across workloads. And in many deployments, GPUs are still assigned to individual models or applications, even when those workloads don’t fully utilize the GPU’s capacity.
These constraints are driving interest in automated approaches to infrastructure management, particularly for these highly variable inference workloads. The report suggests that closing the utilization gap will require treating efficiency as an ongoing, automated process rather than a fixed configuration set at deployment. Gil also noted how the report challenges the idea that teams must overprovision to maintain reliability. In practice, he said, static allocations can still leave some workloads without enough resources while others sit idle. With automated systems that continuously adjust capacity in real time, businesses can reduce waste while also preventing failures, eliminating what he described as the traditional tradeoff between cost and reliability.
(JLStock/Shutterstock)
For Gil, an important takeaway from the report is that companies need to better understand how their AI infrastructure resources are being used. As GPU spending rises and becomes a larger share of overall AI costs, he said, utilization is becoming a key factor in determining return on investment.
“Before you buy GPUs, look at how much you use them. And if it’s that existential, use them more,” Gil said.
That ROI concern is already being raised at the executive level, according to Gil, as companies move from experimenting with AI to scaling it across the business. While demand for compute continues to grow, the cost of supporting those workloads is becoming harder to ignore.
“Just check your return on investment from time to time, and look at your utilization, and you’ll be surprised.”
Download a copy of Cast AI’s report at this link.
The post Companies Are Racing to Buy GPUs. Many Sit Idle appeared first on AIwire.

