MolmoAct introduces action reasoning models for more explainable robot control

Community-Team · August 15, 2025, 7:53pm

Subscribe for free access to Data Points!

Researchers from the University of Washington and Allen Institute for AI have developed MolmoAct, a family of open-source robotic foundation models that integrate perception, planning, and control through structured reasoning. The models generate three types of tokens sequentially: depth perception tokens for 3D understanding, visual reasoning traces showing planned trajectories, and action tokens for robot control. MolmoAct-7B-D achieved 70.5 percent zero-shot accuracy on SimplerEnv Visual Matching tasks, surpassing closed-source models π0 and GR00T N1 (while taking much less time to pre-train), and 86.6 percent average success on LIBERO benchmarks. This more transparent approach to model trajectories in particular addresses some limitations in current vision-language-action models, making robot decision-making more explainable and steerable through visual trajectory editing. The team released all model weights, training code, and the MolmoAct Dataset containing over 10,000 robot trajectories. (arXiv)

Topic		Replies	Views
Sample-Efficient Training for Robots AI Discussions the-batch , ai-discussions	0	104	July 14, 2023
Meta unveils humanoid AI agent for complex task performance AI Discussions ai-discussions , data-points	0	77	December 16, 2024
Helix model offers more adaptability to humanoid robots AI Discussions ai-discussions , data-points	0	53	February 25, 2025
Parsing Commands Into Actions: NLP Helps Google Robot Understand Spoken Instructions AI Discussions the-batch , ai-discussions	0	53	September 29, 2022
Can bolting LLMs onto robots improve the robots ' performance? "Can LLMs Make Robots Smarter?" by Samuel Greengard AI Discussions ai-discussions	3	145	March 24, 2025

MolmoAct introduces action reasoning models for more explainable robot control

Related topics