Cerebras Lands Major OpenAI Deal to Scale AI Inference
OpenAI has announced a partnership with chipmaker Cerebras to add high-speed inference capacity to its computing infrastructure, marking one of the most significant deployments to date of Cerebras’ wafer-scale systems for commercial AI services.
Under the agreement, OpenAI will integrate up to 750 megawatts of Cerebras computing capacity into its inference stack over several years, with deployment beginning in early 2026 and continuing in phases through 2028. The companies said the systems will be used to support latency-sensitive workloads, including agentic AI applications and services.
“Cerebras is the high-speed solution for AI. Whether running coding agents or voice chat, large language models on Cerebras deliver responses up to 15x faster than GPU-based systems,” wrote Cerebras CEO Andrew Feldman in a blog announcement.
A Cerebras wafer-scale engine, designed to combine compute, memory, and interconnects on a single chip (Credit: Cerebras)
OpenAI says the partnership is part of a strategy to diversify its compute portfolio and better match hardware to specific workloads. Rather than relying on a single architecture, the company has increasingly emphasized a mix of systems optimized for training, batch inference, and real-time response. Cerebras’ hardware, which holds compute, memory, and interconnects on a single wafer-scale chip, is designed to reduce data movement and improve response times for large model outputs.
“When AI responds in real time, users do more with it, stay longer, and run higher-value workloads,” OpenAI said in a blog. The company said it will roll out the new capacity incrementally across workloads as integration progresses.
Feldman described the agreement as the culmination of years of technical alignment between the two companies. Feldman said OpenAI and Cerebras began discussions as early as 2017, driven by a shared view that growing model scale would eventually require new hardware architectures to sustain performance.
Financial terms were not disclosed, but Reuters reported that the deal could be worth more than $10 billion over the life of the contract, citing a source familiar with the matter. According to Reuters, OpenAI plans to use the Cerebras systems to help power its ChatGPT service, adding to a series of large infrastructure agreements as demand for OpenAI’s services continues to grow.
The partnership also has implications for Cerebras’ business. The company has historically relied on a small number of large customers, including UAE-based technology firm G42. Reuters noted that the OpenAI agreement could help Cerebras diversify its revenue base as it competes with established AI hardware vendors such as Nvidia and other specialized chipmakers.
Inference latency is an increasingly important constraint as AI applications move from demos to production. While training large models remains computationally intensive, the cost and responsiveness of inference will continue to influence user experience and operating costs. The OpenAI agreement builds on Cerebras’ recent push to scale its inference business beyond research and niche deployments. Over the past year, the company has expanded its inference footprint through partnerships with developer platforms like Hugging Face and by bringing new inference datacenters online across North America and Europe.
For OpenAI, the deal reflects a pattern of sourcing compute from multiple hardware vendors to keep up with its inference needs. In addition to its long-standing reliance on Nvidia GPUs, OpenAI has committed to large future purchases of accelerators from AMD and has entered agreements to design custom chips with other partners. For Cerebras, the agreement represents a transition from targeted inference deployments to operating infrastructure at the scale of a top-tier AI platform.
Related

