Covering Scientific & Technical AI

Why Your Million-Dollar GPUs Are Sleeping on the Job

by Dudy Cohen |
January 6, 2026

Building a GPU cluster is an expensive task. This is true if you are an enterprise building an on-prem infrastructure to run your AI workload, a neocloud offering GPU as a Service, or even a hyperscaler.

Because this is such a large investment, you need to verify a reasonable ROI. One of the main parameters affecting this ROI is cost-per-million-tokens, or CPMT. This is true since the amount of tokens the infrastructure can process is usually correlated to the revenue (or benefit) related to the infrastructure, while the cost of those tokens is a mixture of CAPEX and OPEX derived from the size of the infrastructure, but also from its efficiency, or utilization.

The Gap

There is a gap, however, between the theoretical performance of the cluster and the actual performance. The theoretical performance is simply the product of a single processor (e.g., GPU) performance (in terms of FLOPS) and the number of GPUs in the cluster. The actual performance introduces another factor, and this is the amount (or ratio) of idle GPU cycles during the job run.

(Alexander Supertramp/Shutterstock)

Yes, your GPUs are sleeping on the job. The reason for that derives from the heart of parallel computing. The fact that all the GPUs in the cluster (or, at least, a large number of those) are working on the same workload and the same dataset (regardless of the parallelism type – Data, pipeline, tensor and/or expert) means that a significant element of the process involves data synchronization between those GPUs. This calls for collective operations (or collective communications) which are handled by libraries like NVIDIA’s NCCL (NVIDIA Collective Communications Library) that utilizes the network between the GPUs (i.e., the Backend Network) to distribute data across GPUs.

The Network Bottleneck

Here lies the problem. Networks have not evolved at the pace of compute. While the amount of data compute elements, such as GPUs, has increased significantly, the networking infrastructure, at all levels, failed to keep up with this pace and created a bottleneck that resulted in GPUs waiting idle for data to run through the backend network and feed their next step of computation. In fact, in some cases, hyperscalers found that this time spent on networking can reach 50% and more of the total cycles of the GPUs. This means twice the CPMT compared to the theoretical performance and twice the time for ROI.

This backend networking infrastructure, in fact, is a mixture of three types of networking use-cases. Scale-up, Scale-out and Scale-across, as illustrated in the following diagram:

Scale-up: Originally, the protocols used inside a computer or a server to connect compute and storage elements, are now used to connect those resources across an entire rack. Up to 72 GPUs (in the current generation) can be connected with a high capacity, low latency protocol like NVlink.
Scale-out: A much larger networking infrastructure, connecting GPUs across the entire datacenter.
Scale-across: Connectivity across multiple datacenters, across which a single GPU cluster is spread. This is usually the case when the origin datacenter is out of power resources and further growth is needed.

The mentioned networking bottleneck is relevant to all those networking use cases, but is most noticeable in scale-out scenarios, in which classic datacenter connectivity protocols like Ethernet are not suitable for the performance requirements of backend connectivity.

The Ethernet Evolution

The reason the “classic” Ethernet protocol could not handle backend connectivity for large AI clusters is that at its base, Ethernet is a lossy protocol, i.e., when the utilization of Ethernet links gets higher, connectivity performance drop sharply and parameters like packet-loss, jitter (delay variance) and tail-latency (the latency in which the slowest packet in a session crosses the network) are spiraling out of control. This results, as mentioned, in the GPU having idle cycles and, in severe cases, in a complete reset of the workload, both of which can lead to 30%-50% degradation in workload performance (often measured in job completion time – JCT).

There are non-Ethernet solutions for scale-up and scale-out networking that perform better. The industry called for an Ethernet-based solution that would close this performance gap and enable infrastructure owners to fully utilize their investment in GPUs and shorten ROI time, reducing CPMT.

The solution was the evolution of Ethernet. The main quality missing from the original Ethernet architecture was scheduling. Adding a mechanism that will control the way Ethernet frames cross the backend fabric and ensure better fabric-links load balancing will result in lower (to none) packet-loss, very low (to zero) jitter and significantly lower tail-latency. These are key parameters to reduce job completion time and significantly reduce the amount of idle GPU cycles.

Two methods were introduced to add a scheduling mechanism to Ethernet. One is Fabric-Scheduling, in which the scheduling is done within the fabric itself, where the traffic between the top-of-rack switches and the spine switches is based on a cell-based fabric, which allows perfect load balancing and all of the derived performance benefits. The second approach is Endpoint-Scheduling, where the scheduling (or packet-spraying) is done by the network endpoints, i.e., the Network Interface Cards (NICs – within the server) that send packets to the network in a manner that reduces and bypasses congestion points within the fabric.

A Happy End

This evolution has resulted in a major uptick in GPU cluster performance. It has also changed the way GPU clusters are designed. While in the past the main focus of this design was on compute resources, today, an equal weight in the design is given to networking aspects, acknowledging the importance of this element on the overall business case of a cluster.

About the Author

Dudy Cohen serves as the VP of AI solutions at DriveNets. He has over 30 years of experience in the networking infrastructure world, accumulated from his service in various vendors and service providers, including Ceragon and Alvarion. He holds a BSc.EE. and an MSc.EE. from Tel Aviv University.

QCWire Graphic

Nvidia Says Rubin Will Deliver 5x AI Inference Boost Over Blackwell

When it ships later this year, Nvidia’s latest Rubin GPU will sport 5x the NVPF4…

The ROI of Ambiguity: How Great AI Can Emerge from Vague Questions

Companies love clarity. Define the problem, find clean data, choose the model, deploy, measure –…

Nvidia Releases Nemotron 3, Expanding Its Open Models for Agentic AI

Nvidia has released Nemotron 3, the latest of the company’s open reasoning models designed to…

Untamed Data Is Undermining the AI Revolution

Across industries, organizations are drowning in unstructured data: files, videos, images, chat logs, design documents,…

As AI Scales for Science, the DOE Turns to Nuclear and Federal Land

Data centers running advanced AI systems now require steady and high-density energy at a scale…

A (Mostly) Non-Technical AI Primer

I have studied and watched artificial intelligence grow over the last forty years. Like many,…

Electronics Giants Tap into Industrial Automation with NVIDIA Metropolis for Factories

May 30, 2023 — The $46 trillion global electronics manufacturing industry spans more than 10…

WPP Partners with NVIDIA to Build Generative AI-Enabled Content Engine for Digital Advertising

TAIPEI, Taiwan, May 30, 2023 — NVIDIA and WPP have announced they are developing a…

MediaTek Partners with NVIDIA to Transform Automobiles with AI and Accelerated Computing

May 30, 2023 — MediaTek, a leading innovator in connectivity and multimedia, is teaming with…

HPE Reports Fiscal 2023 2nd Quarter Results

HOUSTON, May 31, 2023 — Hewlett Packard Enterprise has announced financial results for the second quarter…

Syslogic Introduces Rugged Computer Based on NVIDIA Jetson AGX Orin Industrial

BADEN, Switzerland, May 31, 2023 — Syslogic has introduced the first embedded system based on…

Industry Leaders Launch RISE to Accelerate the Development of Open Source Software for RISC-V

BRUSSELS, May 31, 2023 — The RISC-V Software Ecosystem (RISE) Project is a new collaborative…

CoreWeave Extends Its Cloud Platform with NVIDIA Rubin Platform

LIVINGSTON, N.J., Jan. 6, 2026 — CoreWeave, Inc. today announced it will add NVIDIA Rubin technology to…

NVIDIA Unveils Rubin Platform to Support Large-Scale Training and Inference Workloads

Jan. 6, 2026 — NVIDIA has kickstarted the next generation of AI with the launch of the NVIDIA…

Why Your Million-Dollar GPUs Are Sleeping on the Job

Building a GPU cluster is an expensive task. This is true if you are an…

Nvidia Says Rubin Will Deliver 5x AI Inference Boost Over Blackwell

When it ships later this year, Nvidia’s latest Rubin GPU will sport 5x the NVPF4…

Institute of Science Tokyo Advances Catalyst Chemistry with Reaction-Conditioned AI Model

Researchers develop an AI-based platform that integrates reaction data with catalyst performance for the design…

DDN Powers Integrated Compute, Data, and Offload at Scale for NVIDIA Rubin Platform

LOS ANGELES, Jan. 6, 2026 — DDN today announced deep collaboration with NVIDIA to support the…

Source link

What's Hot

Is the Studio Display XDR Worth It in 2026? Answers for Creators

OnePlus 16, iQOO 16, and Redmi K100 Pro Max pricing rumor isn’t good news at all

Covering Scientific & Technical AI

Covering Scientific & Technical AI

Nvidia Says Rubin Will Deliver 5x AI Inference Boost Over Blackwell

The ROI of Ambiguity: How Great AI Can Emerge from Vague Questions

Nvidia Releases Nemotron 3, Expanding Its Open Models for Agentic AI

Untamed Data Is Undermining the AI Revolution

As AI Scales for Science, the DOE Turns to Nuclear and Federal Land

A (Mostly) Non-Technical AI Primer

Electronics Giants Tap into Industrial Automation with NVIDIA Metropolis for Factories

WPP Partners with NVIDIA to Build Generative AI-Enabled Content Engine for Digital Advertising

MediaTek Partners with NVIDIA to Transform Automobiles with AI and Accelerated Computing

HPE Reports Fiscal 2023 2nd Quarter Results

Syslogic Introduces Rugged Computer Based on NVIDIA Jetson AGX Orin Industrial

Industry Leaders Launch RISE to Accelerate the Development of Open Source Software for RISC-V

CoreWeave Extends Its Cloud Platform with NVIDIA Rubin Platform

NVIDIA Unveils Rubin Platform to Support Large-Scale Training and Inference Workloads

Why Your Million-Dollar GPUs Are Sleeping on the Job

Nvidia Says Rubin Will Deliver 5x AI Inference Boost Over Blackwell

Institute of Science Tokyo Advances Catalyst Chemistry with Reaction-Conditioned AI Model

DDN Powers Integrated Compute, Data, and Offload at Scale for NVIDIA Rubin Platform

Covering Scientific & Technical AI

Big Tech Signs White House Data Center Pledge With Good Optics and Little Substance

Bridging the operational AI gap

The Download: Earth’s rumblings, and AI for strikes on Iran

iPhone Pro 13 Rumored to Feature 1 TB of Storage

Oculus Quest X Headset: Discover a Shining New Star

Fujifilm’s 102-Megapixel Camera is the Size of a Typical DSLR

Review: Mi 10 Mobile with Qualcomm Snapdragon 870 Mobile Platform

Comparison of Mobile Phone Providers: 4G Connectivity & Speed

Which LED Lights for Nail Salon Safe? Comparison of Major Brands

Subscribe to Updates

What's Hot

Covering Scientific & Technical AI

Why Your Million-Dollar GPUs Are Sleeping on the Job

The Gap

The Network Bottleneck

The Ethernet Evolution

A Happy End

Related

Nvidia Says Rubin Will Deliver 5x AI Inference Boost Over Blackwell

The ROI of Ambiguity: How Great AI Can Emerge from Vague Questions

Nvidia Releases Nemotron 3, Expanding Its Open Models for Agentic AI

Untamed Data Is Undermining the AI Revolution

As AI Scales for Science, the DOE Turns to Nuclear and Federal Land

A (Mostly) Non-Technical AI Primer

Electronics Giants Tap into Industrial Automation with NVIDIA Metropolis for Factories

WPP Partners with NVIDIA to Build Generative AI-Enabled Content Engine for Digital Advertising

MediaTek Partners with NVIDIA to Transform Automobiles with AI and Accelerated Computing

HPE Reports Fiscal 2023 2nd Quarter Results

Syslogic Introduces Rugged Computer Based on NVIDIA Jetson AGX Orin Industrial

Industry Leaders Launch RISE to Accelerate the Development of Open Source Software for RISC-V

CoreWeave Extends Its Cloud Platform with NVIDIA Rubin Platform

NVIDIA Unveils Rubin Platform to Support Large-Scale Training and Inference Workloads

Why Your Million-Dollar GPUs Are Sleeping on the Job

Nvidia Says Rubin Will Deliver 5x AI Inference Boost Over Blackwell

Institute of Science Tokyo Advances Catalyst Chemistry with Reaction-Conditioned AI Model

DDN Powers Integrated Compute, Data, and Offload at Scale for NVIDIA Rubin Platform

Related Posts