New MIT Framework Uses Search to Handle LLM Errors in AI Agents
When developers build AI agents that rely on large language models, they often face a tricky problem: The models can produce different outputs each time they are called, and some of those outputs are wrong. Recovering from those mistakes usually requires writing complex logic to retry steps or backtrack when something fails.
Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and startup Asari AI have introduced a software framework designed to simplify that process. The framework, called EnCompass, allows developers to add systematic search and backtracking to AI agent programs without rewriting large portions of code. The work was presented at the recent NeurIPS 2025 conference and is described in a paper, “ENCOMPASS: Enhancing Agent Programming with Search Over Program Execution Paths.”
(Chaosamran_Studio/Shutterstock)
“Our goal is to develop an inference-time strategy framework: a framework that makes it easy to experiment with different inference-time strategies independently of the design and implementation of the underlying agent workflow. Such a framework is intended not to replace, but to be used in conjunction with LLM prompting and tool use frameworks, such as LangChain,” the authors wrote.
EnCompass targets what the researchers call “program in control” agents. In these systems, a developer defines the overall workflow in code, such as the sequence of steps an agent follows to translate software, analyze data, or generate hypotheses. The large language model is used only at specific points to perform subtasks, rather than deciding the entire workflow itself. For this use case, the main challenge is handling inaccuracies in LLM outputs, as a single incorrect response can derail the whole process. Developers often address this by manually adding code that retries calls, compares multiple outputs, or returns to earlier steps. According to the authors, this extra logic can be as large and complex as the original agent code.
EnCompass separates the agent’s workflow from the strategy used to explore different possible LLM outputs. Developers annotate parts of their code where an LLM call may produce variable results. These locations are called branchpoints. At runtime, EnCompass treats the agent’s execution as a search problem, exploring different execution paths that result from different LLM outputs. The framework enables backtracking over failed execution paths and can explore multiple execution paths in parallel, depending on the chosen search strategy. Developers can choose from common search strategies like Beam Search or Monte Carlo Tree Search, or define their own strategies, without changing the underlying workflow code.
EnCompass works by compiling a Python function that defines an agent’s workflow into a search space. Since each branchpoint represents a point where execution can diverge, a search algorithm can then sample and evaluate execution paths to score each one based on developer-defined criteria and return the highest scoring result.
(Credit: EnCompass Authors)
The researchers evaluated EnCompass on several agent tasks, including an agent that translates Java code repositories into Python. In that case study, adding search logic using EnCompass required about 80% fewer lines of code than implementing the same logic manually. The search enabled by EnCompass also improved translation accuracy by 15 to 40% when compared with a version of the agent that did not use search.
The authors say EnCompass is not designed for agents that are fully controlled by an LLM, where the model decides on each step. In those systems, there is no fixed workflow for EnCompass to compile into a search space. Instead, EnCompass is meant for developers and researchers building structured AI agents for tasks like code translation, automated analysis, or scientific workflows. By making search and backtracking a built-in runtime feature, the framework could make these agents more reliable and easier to experiment with as LLM-based systems are increasingly being used in software development.
In an MIT News article, co-author Armando Solar-Lezama, an MIT professor of EECS and CSAIL principal investigator, said, “As LLMs become a more integral part of everyday software, it becomes more important to understand how to efficiently build software that leverages their strengths and works around their limitations. EnCompass is an important step in that direction.” Access the full paper here.
Related

