Nvidia Says Rubin Will Deliver 5x AI Inference Boost Over Blackwell
When it ships later this year, Nvidia’s latest Rubin GPU will sport 5x the NVPF4 inference and 3.5x the NVPF4 training performance of Blackwell, Nvidia CEO Jensen Huang said Monday at CES 2026, where Nvidia officially unveiled the Vera Rubin platform.
The AI revolution so far has run through Nvidia, the GPU chipmaker that has gobbled up 90% of the market for AI chips. Its current Blackwell GPU and Grace Blackwell CPU-GPU superchips have sold exceptionally well, and the company is looking to take that success to the next level with Vera Rubin.
“Vera Rubin arrives just in time for the next frontier of AI,” Huang said during his 1.5-hour CES keynote address. “I can tell you that Vera Rubin is in full production.”
Monday’s announcement was a mixture of old and new stuff. Nvidia has been talking about its Vera Rubin superchip since June 2024, which is when it also started talking about NVLink-6, the scale-up interconnect used to develop NVL72 systems. The company announced its Spectrum-X co-package optics (slated to ship in 2026) chip earlier this year, and launched its Bluefield-4 data processing units (DPUs) in October.
Nvidia shared specs of its upcoming Rubin GPU
What we did not know were the performance marks of the new Rubin GPU, which have been kept under wraps up to this point. Huang also shared some color and context on what went into the “extreme co-design” behind the upcoming Vera Rubin NVL72 server, and why it was necessary.
The AI performance specs for Rubin are impressive. According to Nvidia, the new chip will deliver 50 petaflops of NVPF4 inference performance, which is 5 times more than Blackwell, and 35 petaflops of NVFP4 performance, which is 3.5 times more. It will offer 22 TB per second of HBM4 memory bandwidth, a 2.8x improvement over Blackwell, and 3.6 TB per second of NVLink bandwidth per GPU, a 2x increase.
The Vera CPU, which is based on an Arm design, will deliver twice the performance of the Grace CPU chip that it replaces, according to Huang (who didn’t offer specifics). It will feature 88 custom Olympus cores and offer 176 threads per core with Nvidia’s “spatial multi-threading.” It will offer a 1.8 TB per second NVLink C2C connection, offer 1.5 TB of on-chip memory (3x that of Grace), and 1.2 TB per second of LPDDR5X memory bandwidth.
Huang also shared video of the first rack of a Vera Rubin NVL72 pod going online. The pod features 18 compute trays, nine NVLink compute trays, and weighs nearly 2 tons, he said. All told, it features 220 trillion transistors and took 15,000 engineer-years to design, he said.
Huang in front of a rack of Vera Rubin NVL72 servers at CES 2026
The NVLink72 pod was an example of the kind of “extreme co-design” that Nvidia has been forced to do since Moore’s Law has slowed down, Huang said.
“We have a rule inside our company. It’s a good rule. ‘No new generation should have more than 1 or 2 chips change,’” Huang said during his keynote. “But the problem is … Moore’s Law has largely slowed, and so the number of transistors we can get year after year after year can’t possibly keep up with the 10 times larger models.”
As more AI tokens are generated and the costs come down, that puts pressure on Nvidia and other chipmakers to boost performance. Rubin features 1.6x more transistors than Blackwell, which is the starting point for the performance boost. But 1.6 doesn’t get you to 10x.
“It is impossible to keep up with those kinds of rates, for the industry to continue to advance,” he said, “unless we deployed aggressive, extreme co-design–basically innovating across all of the chips, across the entire stack, all at the same time. Which is the reason why we decided that this generation, we had no choice but to design every chip over again.”
Nvidia shared specs of its upcoming Vera GPU
Huang pointed to Nvidia’s Tensor Core technology, which are specialized processing units within its GPUs that are designed to accelerate the matrix multiplication and accumulation (MMA) operations for AI workloads, as one of the main reasons why the company will be able to deliver a 5x increase in AI inference performance with Rubin over Blackwell.
“It’s an entire processing unit that understands how to dynamically, adaptively adjust its precision and structure to deal with different levels of the transformer, so that you can achieve higher throughput wherever it’s possible to lose precision, and to go back to the highest possible precision, wherever you need to,” he said. “This is groundbreaking. I would not be surprised if the industry would like us to make this format and the structure an industry standard in the future. This is completely revolutionary. This is how we were able to deliver such a gigantic step up in performance even though we only have 1.6 times the number of transistors.”
Nvidia has not yet shared the full performance card for its Rubin GPUs. Some in the HPC community have been concerned that the Blackwell generation of chips delivered less high precision capabilities, such as floating point 64-bit workloads, than its previous GPUs. FP64 is critical for traditional modeling and simulation workloads that have been the bread and butter for the supercomputing community for years.
Last month, Nvidia told HPCwire that it wasn’t abandoning 64-bit computing. We’ll likely have to wait until GTC 26 in March to see the performance specs for FP64, not to mention the energy consumption that is so critical these days.
This article first appeared on HPCwire.
Related

