The partnership is designed to give IT decision-makers the flexibility to run high-performance, efficient AI inference at scale, independent of underlying hardware.
Red Hat, a global leader in open source solutions, has announced an expanded collaboration with Amazon Web Services (AWS) to deliver enterprise-grade generative AI (gen AI) capabilities on AWS using Red Hat AI and AWS AI silicon. The partnership is designed to give IT decision-makers the flexibility to run high-performance, efficient AI inference at scale, independent of underlying hardware.
The growing demand for gen AI and scalable inference is prompting organizations to rethink their IT infrastructure. According to IDC, by 2027, 40% of organizations are expected to use custom silicon, including ARM processors or AI/ML-specific chips, to meet the increasing need for performance optimization, cost efficiency, and specialized computing. This highlights the importance of solutions that improve processing power, reduce costs, and accelerate innovation for high-performance AI applications.
The collaboration brings together Red Hat’s comprehensive AI platform capabilities with AWS cloud infrastructure and AI chipsets, including AWS Inferentia2 and AWS Trainium3. Red Hat AI Inference Server, powered by vLLM, will be optimized to run on AWS AI chips, providing a common inference layer that supports any gen AI model. This enables higher performance, lower latency, and cost-effective scaling of production AI deployments, with up to 30–40% better price performance than comparable GPU-based Amazon EC2 instances.
Red Hat has also integrated AI capabilities into Red Hat OpenShift, including the development of an AWS Neuron operator for Red Hat OpenShift, Red Hat OpenShift AI, and Red Hat OpenShift Service on AWS. These initiatives provide a seamless and supported pathway for customers to run AI workloads using AWS accelerators. Enhanced access to AWS AI chips and the release of the amazon.ai Certified Ansible Collection further simplify deployment and orchestration of AI services on AWS.
Additionally, Red Hat and AWS are collaborating to optimize an AWS AI chip plugin upstream for vLLM, with Red Hat contributing as the top commercial supporter of vLLM. The vLLM platform underpins llm-d, an open source project enabling AI inference at scale, now commercially supported through Red Hat OpenShift AI 3.
This collaboration builds on Red Hat’s long-standing partnership with AWS, addressing the evolving needs of organizations integrating AI into hybrid cloud strategies. It provides an optimized and efficient pathway for enterprises to achieve high-performance gen AI outcomes across cloud and on-premises environments.

