You could build an entire data center around a new GPU with elaborate scale-up networking, exotic chiplet architectures, and advance liquid cooling. Or if you’re AMD, you could release a powerful GPU that customers can plug directly into the PCI busses of their existing servers, providing an immediate boost for running new AI workloads.
That’s just what AMD did last week with the release of its MI350P, the latest GPU in its Instinct line. Boasting 185 billion transistors, 144GB of HBM3e capacity, and 4 TB per second of peak memory bandwidth, the MI350P is designed to run small, medium, and large language models for AI inferencing and RAG (retrieval augmented generation) use cases.
AMD MI350P
The MI350P plugs into a standard PCIe Gen 5 bus, providing 128GB per second of connectivity with a host. It operates within a 600W thermal envelope, and supports BF16, FP8, MXFP6 and MXFP4 workloads, offering 2,299 teraflops and up to 4,600 peak teraflops at MXFP4 precision through 128 AMD CDNA 4th Gen compute units.
Up to eight MI350P GPUs can be configured per node, and customers can segment their MI350P GPUs into four partitions, each with 36GB of HBM3 memory. The GPU is designed to handle AI models with up to 200 billion to 250 billion parameters; it also provides video and JPG decoding.
The new GPU uses standard air cooling, which AMD makes a point of. “Adopting AI doesn’t mean rebuilding infrastructure from the ground up,” wrote Suresh Andani, who heads business development teams for compute and enterprise AI at AMD, in an AMD blog post. “With AMD Instinct MI350P PCIe cards, enterprises can run more models and serve more users within their existing data centers.”
AMD launched the MI350P with support from computer makers, including Dell Technologies. David Schmidt, Dell’s vice president of product management, said the new GPU will help customers move forward more quickly. “For enterprises serious about AI, on-premises infrastructure isn’t a compromise,” he said. “It’s a competitive advantage delivering the control, security and predictable outcomes that matter most.”
AMD is also touting its software stack for its MI350 line of GPUs (Source: AMD)
Gigabyte is also adopting the MI350P across its AI server portfolio. Gigabyte General Manager Daniel Hou praised the new GPU for its practicality. “With its PCIe-based design, AMD Instinct MI350P enables flexible deployment and seamless integration into systems, allowing enterprises to build high-performance AI environments with the flexibility and efficiency required to scale globally,” Hou said.
AMD is also working on higher end air-cooled GPUs, as well as liquid cooled. For instance, it offers the Instinct UB B8, which is an 8-GPU air-cooled configuration of its MI350X and MI355X line that is delivered as a Universal Baseboard. The UB B8 delivers 2.3TB of HBM3, offering 8TB per second of memory bandwidth. It will also plug into AMD’s Infinity Fabric to provide scale-up capabilities that AMD says will be on par with Nvidia Blackwell. The UB B8 will support models with up to 500 billion parameters and is designed for AI training and inference at scale.
AMD also offers a liquid cooled version of the Instinct MI355X, which features a thermal envelope up to 1,400W. Supermicro and TensorWave are partnering with AMD to support these liquid-cooled chips. AMD also offers a liquid-cooled version of its Radeon gaming GPU.
There is definitely a market for ultra high-end GPUs that can be strung together in exotic ways to train the biggest AI models and power massive AI factories. These absolutely require liquid cooling, and possibly even different electrical regimes, such as Nvidia’s shift to 800V DC. But there are plenty of customers that need HPC gear to run slightly smaller AI models on their existing stack and who don’t want to build an entirely new data centers to do so. This is the segment that AMD is targeting with the MI355P GPUs.
This article originally appeared in HPCwire.
The post AMD Unveils PCIe GPU Card for AI Inference appeared first on AIwire.

