Covering Scientific & Technical AI

New MIT Framework Uses Search to Handle LLM Errors in AI Agents

When developers build AI agents that rely on large language models, they often face a tricky problem: The models can produce different outputs each time they are called, and some of those outputs are wrong. Recovering from those mistakes usually requires writing complex logic to retry steps or backtrack when something fails.

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and startup Asari AI have introduced a software framework designed to simplify that process. The framework, called EnCompass, allows developers to add systematic search and backtracking to AI agent programs without rewriting large portions of code. The work was presented at the recent NeurIPS 2025 conference and is described in a paper, “ENCOMPASS: Enhancing Agent Programming with Search Over Program Execution Paths.”

(Chaosamran_Studio/Shutterstock)

“Our goal is to develop an inference-time strategy framework: a framework that makes it easy to experiment with different inference-time strategies independently of the design and implementation of the underlying agent workflow. Such a framework is intended not to replace, but to be used in conjunction with LLM prompting and tool use frameworks, such as LangChain,” the authors wrote.

EnCompass targets what the researchers call “program in control” agents. In these systems, a developer defines the overall workflow in code, such as the sequence of steps an agent follows to translate software, analyze data, or generate hypotheses. The large language model is used only at specific points to perform subtasks, rather than deciding the entire workflow itself. For this use case, the main challenge is handling inaccuracies in LLM outputs, as a single incorrect response can derail the whole process. Developers often address this by manually adding code that retries calls, compares multiple outputs, or returns to earlier steps. According to the authors, this extra logic can be as large and complex as the original agent code.

EnCompass separates the agent’s workflow from the strategy used to explore different possible LLM outputs. Developers annotate parts of their code where an LLM call may produce variable results. These locations are called branchpoints. At runtime, EnCompass treats the agent’s execution as a search problem, exploring different execution paths that result from different LLM outputs. The framework enables backtracking over failed execution paths and can explore multiple execution paths in parallel, depending on the chosen search strategy. Developers can choose from common search strategies like Beam Search or Monte Carlo Tree Search, or define their own strategies, without changing the underlying workflow code.

EnCompass works by compiling a Python function that defines an agent’s workflow into a search space. Since each branchpoint represents a point where execution can diverge, a search algorithm can then sample and evaluate execution paths to score each one based on developer-defined criteria and return the highest scoring result.

(Credit: EnCompass Authors)

The researchers evaluated EnCompass on several agent tasks, including an agent that translates Java code repositories into Python. In that case study, adding search logic using EnCompass required about 80% fewer lines of code than implementing the same logic manually. The search enabled by EnCompass also improved translation accuracy by 15 to 40% when compared with a version of the agent that did not use search.

The authors say EnCompass is not designed for agents that are fully controlled by an LLM, where the model decides on each step. In those systems, there is no fixed workflow for EnCompass to compile into a search space. Instead, EnCompass is meant for developers and researchers building structured AI agents for tasks like code translation, automated analysis, or scientific workflows. By making search and backtracking a built-in runtime feature, the framework could make these agents more reliable and easier to experiment with as LLM-based systems are increasingly being used in software development.

In an MIT News article, co-author Armando Solar-Lezama, an MIT professor of EECS and CSAIL principal investigator, said, “As LLMs become a more integral part of everyday software, it becomes more important to understand how to efficiently build software that leverages their strengths and works around their limitations. EnCompass is an important step in that direction.” Access the full paper here.

QCWire Graphic

Electronics Giants Tap into Industrial Automation with NVIDIA Metropolis for Factories

May 30, 2023 — The $46 trillion global electronics manufacturing industry spans more than 10…

WPP Partners with NVIDIA to Build Generative AI-Enabled Content Engine for Digital Advertising

TAIPEI, Taiwan, May 30, 2023 — NVIDIA and WPP have announced they are developing a…

MediaTek Partners with NVIDIA to Transform Automobiles with AI and Accelerated Computing

May 30, 2023 — MediaTek, a leading innovator in connectivity and multimedia, is teaming with…

HPE Reports Fiscal 2023 2nd Quarter Results

HOUSTON, May 31, 2023 — Hewlett Packard Enterprise has announced financial results for the second quarter…

Syslogic Introduces Rugged Computer Based on NVIDIA Jetson AGX Orin Industrial

BADEN, Switzerland, May 31, 2023 — Syslogic has introduced the first embedded system based on…

Industry Leaders Launch RISE to Accelerate the Development of Open Source Software for RISC-V

BRUSSELS, May 31, 2023 — The RISC-V Software Ecosystem (RISE) Project is a new collaborative…

New MIT Framework Uses Search to Handle LLM Errors in AI Agents

When developers build AI agents that rely on large language models, they often face a…

Argonne Helps Nuclear Industry Embrace AI to Speed Up Licensing and Reduce Delays

Three collaborative projects aim to streamline AI deployment in nuclear facilities. Feb. 5, 2026 — Artificial…

Cerebras Systems Raises $1 Billion Series H

SUNNYVALE, Calif., Feb. 4, 2026 — Cerebras Systems today announced the closing of a $1…

Positron AI Raises $230M Series B to Scale Energy-Efficient AI Inference

RENO, Nev., Feb. 4, 2026 — Positron AI, a leader in energy-efficient AI inference hardware, today…

TUM Unveils EU’s 1st 7nm AI Chip with Local Processing and RISC-V Architecture

Feb. 4, 2026 — The Technical University of Munich (TUM) has unveiled the EU’s first AI…

Gartner Forecasts Worldwide IT Spending to Grow 10.8% in 2026, Totaling $6.15T

STAMFORD, Conn., Feb. 4, 2026 — Worldwide IT spending is expected to reach $6.15 trillion…

Source link

What's Hot

SwitchBot Lock Vision Pro | Key Features, Price & Alternatives

Honor is working on Privacy Display for its next flagships, Samsung will offer it to its competitors

Covering Scientific & Technical AI

Covering Scientific & Technical AI

Electronics Giants Tap into Industrial Automation with NVIDIA Metropolis for Factories

WPP Partners with NVIDIA to Build Generative AI-Enabled Content Engine for Digital Advertising

MediaTek Partners with NVIDIA to Transform Automobiles with AI and Accelerated Computing

HPE Reports Fiscal 2023 2nd Quarter Results

Syslogic Introduces Rugged Computer Based on NVIDIA Jetson AGX Orin Industrial

Industry Leaders Launch RISE to Accelerate the Development of Open Source Software for RISC-V

New MIT Framework Uses Search to Handle LLM Errors in AI Agents

Argonne Helps Nuclear Industry Embrace AI to Speed Up Licensing and Reduce Delays

Cerebras Systems Raises $1 Billion Series H

Positron AI Raises $230M Series B to Scale Energy-Efficient AI Inference

TUM Unveils EU’s 1st 7nm AI Chip with Local Processing and RISC-V Architecture

Gartner Forecasts Worldwide IT Spending to Grow 10.8% in 2026, Totaling $6.15T

Covering Scientific & Technical AI

A breath test could diagnose pneumonia in minutes

Covering Scientific & Technical AI

Elephant alert! AI warning systems aim to avoid deadly clashes across India

iPhone Pro 13 Rumored to Feature 1 TB of Storage

Oculus Quest X Headset: Discover a Shining New Star

Fujifilm’s 102-Megapixel Camera is the Size of a Typical DSLR

Review: Mi 10 Mobile with Qualcomm Snapdragon 870 Mobile Platform

Comparison of Mobile Phone Providers: 4G Connectivity & Speed

Which LED Lights for Nail Salon Safe? Comparison of Major Brands

Subscribe to Updates

What's Hot

Covering Scientific & Technical AI

New MIT Framework Uses Search to Handle LLM Errors in AI Agents

Related

Electronics Giants Tap into Industrial Automation with NVIDIA Metropolis for Factories

WPP Partners with NVIDIA to Build Generative AI-Enabled Content Engine for Digital Advertising

MediaTek Partners with NVIDIA to Transform Automobiles with AI and Accelerated Computing

HPE Reports Fiscal 2023 2nd Quarter Results

Syslogic Introduces Rugged Computer Based on NVIDIA Jetson AGX Orin Industrial

Industry Leaders Launch RISE to Accelerate the Development of Open Source Software for RISC-V

New MIT Framework Uses Search to Handle LLM Errors in AI Agents

Argonne Helps Nuclear Industry Embrace AI to Speed Up Licensing and Reduce Delays

Cerebras Systems Raises $1 Billion Series H

Positron AI Raises $230M Series B to Scale Energy-Efficient AI Inference

TUM Unveils EU’s 1st 7nm AI Chip with Local Processing and RISC-V Architecture

Gartner Forecasts Worldwide IT Spending to Grow 10.8% in 2026, Totaling $6.15T

Related Posts