New Ai2 Robotics Models Aim to Bridge the Sim-to-Real Gap
Ai2 is making a bold claim in robotics: It’s possible for a robot to learn useful manipulation skills entirely in simulation, then carry those skills into the real world without ever being trained on real-world robot data.
That is the core idea behind MolmoSpaces and MolmoBot, two new open releases from the Allen Institute for AI (Ai2). MolmoSpaces is a large simulation environment for embodied AI research, built from more than 230,000 indoor scenes, over 130,000 object models, and over 42 million robotic grasp annotations. MolmoBot is the manipulation model trained on top of it. Together, these models are designed to address the sim-to-real gap, a foundational robotics challenge of ensuring that behaviors learned in simulation still work when deployed on physical robots.
In evaluations, MolmoBot successfully performed several core manipulation tasks. It was able to perform pick-and-place operations, manipulate articulated objects such as drawers and cabinets, and open doors on two robot platforms, including the Franka FR3 arm and the RB-Y1 mobile manipulator, without real-world fine-tuning. This kind of result is known as zero-shot sim-to-real transfer, and it challenges the common view in robotics that synthetic training can only go so far before human-guided demonstrations become necessary.
(Graphic Courtesy of Ai2)
This claim of zero-shot sim-to-real transfer is what makes this release notable. Much of today’s robotics work depends on large volumes of teleoperated data collected by people guiding robots through tasks in the physical world. Those datasets are expensive to produce, usually closed, and difficult to reproduce. To learn more about the problem, AIwire interviewed Ranjay Krishna, a researcher who leads Ai2’s PRIOR team, which works on multimodal and embodied AI. The group develops models that combine vision, language, and audio, and has released open vision-language systems alongside projects in areas like satellite imagery analysis. More recently, the team has expanded that work into robotics, where those models are used to guide physical actions.
Krishna said his group wanted to approach the sim-to-real problem differently by asking what simulation itself was missing, rather than assuming real-world data was the only solution. His answer was diversity.
“Our overall goal is to deliver breakthrough AI models that can tackle really hard challenges. Our approach has been to turn to simulation and figure out what it is about simulation that makes it less appealing for people, versus going out and collecting large, expensive data sets,” Krishna said. “Our hypothesis going into this set of projects a couple of years ago was that simulation data is just not diverse enough for us to generate enough useful data to train these models.”
Krishna explained how the issue is not that simulation is inherently too artificial, but that it often lacks the range of conditions robots encounter outside the lab. Ai2 addressed that by drastically expanding the variety of environments and situations available during training. Robots trained in MolmoSpaces encounter different layouts, lighting conditions, object placements, camera viewpoints, and physical interactions, all generated inside the simulator. The idea is that if a model sees enough variation during training, the real world begins to look like just another environment.
MolmoSpaces provides an extensive simulation environment used to generate that diversity. The system includes more than 230,000 indoor scenes ranging from homes and offices to hospitals and museums, each populated with objects whose physical properties, like weight, material, and articulation, are modeled for robotic interaction. Because everything is simulated, researchers can generate large volumes of robot trajectories and test how well models generalize across many environments rather than evaluating them in a single fixed setup.
An Ai2 researcher trains a robot to place a banana in a specific bowl. He sporadically moves the camera, but the robot remains on task (Source: Ai2 video)
Krishna gave an especially vivid example. In many robot systems, even a slight camera shift can cause performance to collapse because the model has become overly dependent on a fixed visual viewpoint. Ai2 wanted to break that brittleness by introducing those kinds of disruptions directly into simulation. As a result, Krishna said, MolmoBot could continue performing tasks even when the camera was moved around substantially, including by hand.
As it typically does with its research releases, Ai2 is releasing MolmoSpaces and MolmoBot as open infrastructure. The organization is publishing not just the models, but the whole stack: data, generation pipelines, assets, benchmarks, and tools. MolmoSpaces is also designed to work across common simulators, including MuJoCo, ManiSkill, and Nvidia Isaac Lab and Sim, making it more accessible as shared research infrastructure.
“A big principle that we’ve adhered to here at the Allen Institute is that everything that we do is completely reproducible, completely open source,” Krishna said. “You can take all of our data, all of our environments, and build from scratch everything that we build in-house.”
Krishna acknowledged that the sim-to-real gap has become a central debate in robotics research: “It’s been such a hot topic. There are blog posts written by some of these larger corporations where they talk about how real data is what we need to make these robots work in the real world, and anything you do in simulation just isn’t going to transfer to the real world,” he said. “We’re empirically showing that it actually does work. The thing that was missing is a diversity of environments in our simulation engines, a high-quality set of assets, a high-quality set of grasps, and a large amount of data that we can generate.”
None of this means the sim-to-real gap is closed, Krishna was careful to note. Ai2 has demonstrated transfer for a small set of manipulation tasks in environments that are still relatively controlled, not the full complexity of real-world robotics. Tasks involving dynamic motion, such as catching or throwing objects, remain largely unexplored, as do more dexterous manipulations that require rotating or repositioning items with greater precision. Robots also still struggle with reasoning tasks such as searching through cluttered spaces for a specific object.
Krishna said the team is already working on a follow-up to MolmoBot aimed at tackling more complex, multi-step tasks. Instead of simple manipulation, the goal is to handle multi-step instructions, such as finding an item in a home and retrieving it or cleaning up a room by breaking the request into smaller actions. That requires models that can map their surroundings, remember where they have already looked, and plan a series of steps toward a goal. Krishna added that simulation alone will not solve the problem. While large simulated environments can accelerate training, robots will still need to learn continuously from real-world experience and demonstrations. “Simulation is definitely part of the answer,” he said, but developing robots that can keep learning after deployment remains a major challenge.
For Krishna, the biggest takeaway from this project is that simulation can play a much larger role in robotics than many researchers once believed: “The main message for us is that sim-to-real is possible and you can reproduce these results, and we’re hoping that it’s going to lead to better, more equitable models that anyone can build.”
Read more about the details of this release and view demo videos in a technical blog from Ai2 at this link.
Related

