In his keynote at AWS re:Invent this week, CEO Matt Garman outlined how AWS is thinking about the next stage of AI adoption. He pointed to a shift that has gained momentum across the industry this year: the next phase of AI is all about inference, where applications reason, respond, and act in real time. “Infrastructure is just a part of the story,” he said on stage. “We’re seeing nearly every single application in the world being reinvented by AI, and we’re moving to a future where inference is such an integral part of each application that everyone builds now.”
This was the backdrop for one of the keynote’s product announcements: AWS introduced a new generation of its Nova models designed for the inference-heavy world Garman described. Nova 2 is designed to give enterprises the reasoning depth and production-readiness for this next phase of AI, and it anchors much of AWS’s evolving strategy for its Bedrock platform.
(Shutterstock)
Garman introduced the Nova announcements in the context of Bedrock’s growing role in the AI stack. As more organizations move generative AI projects from prototype to production, they need an inference platform that spans model choice, data integration, cost control, and guardrails. “Some of the largest-scale AI applications in the world all run on this platform,” he said, noting that more than 50 customers have each processed over one trillion tokens through Bedrock. To succeed, these companies “need a secure, scalable, feature-rich inference platform,” and Bedrock is AWS’s answer.
According to Garman, Bedrock has more than doubled its model catalog in the past year, adding open and proprietary models in a range of sizes. “We’ve never believed there would be one model to rule them all,” he said, noting that AWS customers rely on different models for different steps in their applications and agent workflows. In the keynote, he added to the Bedrock catalog, announcing support for open-weights models from Google (Gemma), Nvidia (Nemotron), and Mistral AI, highlighting AWS’s commitment to model diversity.
Building on the Nova models AWS introduced at last year’s re:Invent, Garman announced Nova 2, a new generation of the company’s proprietary models intended for the higher-volume inference demands of production use. He described four models for reasoning, speech, and multimodal tasks, each designed to balance capability with lower operational costs.
The first he described is Nova 2 Lite, a workhorse model for inference. Garman described it as fast and cost-effective, with performance in line with models like Claude Haiku 4.5, GPT-5 Mini, and Gemini Flash 2.5. Nova 2 Lite supports text, image, and video inputs and is geared toward tasks like instruction following, tool use, code generation, and extracting structured information from documents, tasks that appear frequently in agentic workflows and automations. With this model, AWS is targeting organizations that need predictable throughput and lower cost for high-volume inference.
For more complex workloads, Nova 2 Pro is AWS’s highest-intelligence reasoning model, Garman said. It is aimed at scenarios where agents must plan, use tools, and navigate multi-step sequences. On internal and third-party reasoning benchmarks, Garman said Nova 2 Pro often matches or exceeds models such as GPT-5.1, Gemini 3 Pro, and Claude 4.5. As enterprises experiment with multi-agent systems, workflow planners, and long-context tasks, AWS wants customers to see Nova 2 Pro as the model tier optimized for reasoning depth and controlled behavior.
(Shutterstock)
Rounding out the family is Nova 2 Sonic, AWS’s speech-to-speech model designed for real-time conversational applications. Garman highlighted improved latency, expanded language coverage, and more natural speech patterns. Sonic is meant for call centers, interactive assistants, and any application where response timing and voice quality matter.
Garman then mentioned the most ambitious addition to the Nova family, Nova 2 Omni, a unified multimodal model that can process text, images, video, and audio and generate both text and images. Garman presented it as the first reasoning model to support this full range of inputs and outputs in one place. Omni is meant to replace the chain of specialized models that multimodal workflows often require. Instead of pairing separate systems for transcription, object recognition, document analysis, and image generation, Omni is built to handle those steps at once. Garman used his keynote as an example: Omni could watch the talk, interpret the slides, listen to his speech, and produce a comprehensive summary. The goal is to support complex workloads that span multiple formats without the cost and complexity of assembling a pipeline.
Across the Nova 2 family, AWS is outlining a model tier built for the inference-heavy landscape Garman described earlier in his talk, a landscape requiring models with varied reasoning depth and multimodal analysis capabilities. The way he describes it, AI is moving toward an application challenge, not a training challenge, and Nova 2 is intended to supply the intelligence these production systems will need.
The post AWS Anchors Its Inference Strategy With the Launch of Nova 2 appeared first on AIwire.
