Covering Scientific & Technical AI

If Reproducibility in AI Is Important, Try Model Flows

Reproducibility is absolutely critical in science, but it’s a troublesome characteristic when it comes to AI. Frontier models developed by Big AI may deliver superior accuracy and reasoning capabilities, but they do so largely as black boxes with little regard for reproducibility. If AI is going to turbo-charge scientific productivity, it must do so without compromising reproducibility. The question, then, becomes how to achieve it.

This was the topic of a presentation at the TPC26 conference last week by Noah Smith, a computer scientist at the University of Washington and senior director of NLP research at the Allen Institute for Artificial Intelligence. Smith discussed why it’s important for scientists to have AI tools that meet their needs when it comes to reproducibility, and how model flows can help to deliver them.

Image courtesy Noah Smith, Ai2

“Scientists need to be able to inspect and control their tools. A big part of science is your tools–the engineering, the systems that are going to help you answer questions,” Smith said. “At the Allen Institute for AI and with our collaborators at the University of Washington and other universities, we’ve taken the position that the way to get to this fine-grained control and inspectability is through what we call model flows.”

What exactly is a “model flow”? Smith went on:

“We use this term ‘model flow’ to refer to a kind of full openness,” he continued. “Everything that you need to reproduce the work from the very beginning: all of the data, the model weights…and intermediate checkpoints. We describe the entire recipe. I’ll give you all the code that you need to reproduce any stage so that you can go back and change anything. All of our evaluations are careful and open, and we richly document and analyze the capabilities of the models.”

Clearly, many frontier models fail to check even some of these boxes. Claude, Gemini, and GPT from Anthropic, Google, and OpenAI are all extremely capable models that deliver stellar results on many general purpose topics, but they are closed source and don’t offer the full model flows that is critical for reproducibility. Scientists receiving funding from government institutions, including the Department of Energy and National Science Foundation, can use these proprietary frontier models, although they must meet strict privacy and security guarantees.

Image courtesy Noah Smith, Ai2

There are other challenges with using frontier models from Big AI. For starters, they’re optimized for consumer and enterprise usage, not necessarily for science (although some Big AI providers, like Google, are offering science packages). They also tend to be quite expensive to use at scale, which is why much of the discussion of AI for science and engineering, at least in the public sphere, tends to take place around fully open models.

The Allen Institute for AI (AI2), which received $152 million in funding last August from the NSF and Nvidia, is developing the Olmo 3 family of fully open models, intended primarily for use by scientists and engineers. Olmo 3, available in 7B and 32B sizes, delivers the full model flows that scientists need, and but at a fraction of the data budget of something like Qwen 3, according to Smith.

One of the Olmo 3 models is Molmo, a vision-language model designed to generate textual descriptions from visual input, and MolmoPoint, which adds support for pointing commands. Vision-language models are important for bridging the gap between AI models and agents and robots that are going to act in the real world, Smith said. Molmo2, which was recently released, adds support for video.

There is also DR Tulu, a reinforcement learning (RL) model designed to power deep research agents. The DR Tulu stack gives scientists the ability to create agents that search and browse literature, evaluate relevance, integrate evidence, write answers with attribution, and evaluate precision and recall. It uses RL to create rubrics that evolve based on what the agent discovers. DR Tulu-8B performs comparatively to GPT-5 Search, OpenAI DR, and Claude Sonnet, but at a cost that is 100X to 1,000X less.

Noah Smith, Ai2 director of NLP research and computer science professor at University of Washington

Olmo Hybrid, meanwhile, melds the precise recall of transformers with the superior state tracking of linear recurrent neural networks (RNNs) to create a hybrid model that excels at both. Olmo Hybrid delivers superior performance in math, coding, and other categories compared to Olmo 3-7B, as well as offering better scaling, according to Smith.

While the AI models from Ai2 can deliver comparable performance to proprietary frontier models, they do so with full reproducibility as a result of their open model flows. They’re also more adaptable than frontier models, which Smith cited as another factor in their favor. If scientists value reproducibility, adaptability, and the ability to control their own AI models, then fully open models should be where they are putting their chips, he said.

“I think reproducing commercial AI is too small a goal for those of us working in the open space,” Smith said. “I think building infrastructure for science needs to enable scientific communities to do things that the market is just never going to prioritize: Inspect the internals of the system, adapt it to local scientific requirements, study every aspect of its development so we can make improvements, [and] control the costs and specialize for long-tail domains.”

QCWire Graphic

Broadcom Announces VMware Cloud Foundation 9.1

PALO ALTO, Calif., May 5, 2026 — Broadcom Inc., a global technology leader that designs,…

NVIDIA Announces Financial Results for 1st Quarter Fiscal 2027

SANTA CLARA, Calif., May 21, 2026 — NVIDIA (NASDAQ: NVDA) has reported record revenue for…

Anthropic Unveils ‘Project Glasswing’ as Claude Mythos Targets Software Vulnerabilities

April 9, 2026 — Anthropic has announced Project Glasswing, a new initiative that brings together Amazon…

Google Unveils Gemini Enterprise Agent Platform, Expands Vertex AI into Full Agent Stack

At Google Cloud Next 2026 in Las Vegas this week, the company announced major changes…

OpenAI Shutters Sora, Shifts Business Strategy Ahead of IPO

Back in 2022, OpenAI set off a chain reaction in the tech world when it…

Cerebras Systems Announces Launch of Initial Public Offering

SUNNYVALE, Calif., May 4, 2026 — Cerebras Systems, Inc. today announced that it plans to commence…

If Reproducibility in AI Is Important, Try Model Flows

Reproducibility is absolutely critical in science, but it’s a troublesome characteristic when it comes to…

SRAM Chips Pulling Ahead in the New AI World

Thanks to the success of its GPUs in powering the first stage of the AI…

AWS Launches Amazon Bedrock Managed Knowledge Base for Enterprise RAG Applications

June 17, 2026 — Amazon Bedrock Managed Knowledge Base, a fully managed retrieval-augmented generation (RAG) service,…

Hitachi Expands Its Work with OpenAI to Accelerate AI-Driven Modernization and Cybersecurity

TOKYO, June 17, 2026 — Hitachi, Ltd. today announced that it will expand its work…

Enterprise AI Has Outgrown Prompt Security

For most organizations, AI security has meant one thing: keeping sensitive data out of the…

IBM Study: Limited Control and Rising Dependencies Leave Enterprises Exposed in the Age of AI

ARMONK, N.Y., June 17, 2026 — A new global study by the IBM Institute for…

Source link

What's Hot

Gurman: Apple AirPods with built-in camera may arrive in late 2027

The Korean Telecom Giant at the Center of Anthropic’s Mythos Controversy

CBUAE Completes BIS-Led Project Aperta to Advance Cross-Border Open Finance Interoperability

Covering Scientific & Technical AI

Broadcom Announces VMware Cloud Foundation 9.1

NVIDIA Announces Financial Results for 1st Quarter Fiscal 2027

Anthropic Unveils ‘Project Glasswing’ as Claude Mythos Targets Software Vulnerabilities

Google Unveils Gemini Enterprise Agent Platform, Expands Vertex AI into Full Agent Stack

OpenAI Shutters Sora, Shifts Business Strategy Ahead of IPO

Cerebras Systems Announces Launch of Initial Public Offering

If Reproducibility in AI Is Important, Try Model Flows

SRAM Chips Pulling Ahead in the New AI World

AWS Launches Amazon Bedrock Managed Knowledge Base for Enterprise RAG Applications

Hitachi Expands Its Work with OpenAI to Accelerate AI-Driven Modernization and Cybersecurity

Enterprise AI Has Outgrown Prompt Security

IBM Study: Limited Control and Rising Dependencies Leave Enterprises Exposed in the Age of AI

The Korean Telecom Giant at the Center of Anthropic’s Mythos Controversy

Operating a Humanoid With Your Body Is a Hot Job in China’s Hardware Capital

Entrepreneurs in Nairobi make the case for going solar

Covering Scientific & Technical AI

iPhone Pro 13 Rumored to Feature 1 TB of Storage

Oculus Quest X Headset: Discover a Shining New Star

Fujifilm’s 102-Megapixel Camera is the Size of a Typical DSLR

Review: Mi 10 Mobile with Qualcomm Snapdragon 870 Mobile Platform

Comparison of Mobile Phone Providers: 4G Connectivity & Speed

Which LED Lights for Nail Salon Safe? Comparison of Major Brands

Subscribe to Updates

What's Hot

Covering Scientific & Technical AI

If Reproducibility in AI Is Important, Try Model Flows

Related

Broadcom Announces VMware Cloud Foundation 9.1

NVIDIA Announces Financial Results for 1st Quarter Fiscal 2027

Anthropic Unveils ‘Project Glasswing’ as Claude Mythos Targets Software Vulnerabilities

Google Unveils Gemini Enterprise Agent Platform, Expands Vertex AI into Full Agent Stack

OpenAI Shutters Sora, Shifts Business Strategy Ahead of IPO

Cerebras Systems Announces Launch of Initial Public Offering

If Reproducibility in AI Is Important, Try Model Flows

SRAM Chips Pulling Ahead in the New AI World

AWS Launches Amazon Bedrock Managed Knowledge Base for Enterprise RAG Applications

Hitachi Expands Its Work with OpenAI to Accelerate AI-Driven Modernization and Cybersecurity

Enterprise AI Has Outgrown Prompt Security

IBM Study: Limited Control and Rising Dependencies Leave Enterprises Exposed in the Age of AI

Related Posts