October 31, 2024
We present a benchmark for Planning And Reasoning Tasks in humaN-Robot collaboration (PARTNR) designed to study human-robot coordination in household activities. PARTNR tasks exhibit characteristics of everyday tasks, such as spatial, temporal, and heterogeneous agent capability constraints. We employ a semi-automated task generation pipeline using Large Language Models (LLMs), incorporating simulation in the loop for grounding and verification. PARTNR stands as the largest benchmark of its kind, comprising 100,000 natural language tasks, spanning 60 houses and 5,819 unique objects. We analyze state-of-the-art LLMs on PARTNR tasks, across the axes of planning, perception and skill execution. The analysis reveals significant limitations in SoTA models, such as poor coordination and failures in task tracking and recovery from errors. When LLMs are paired with real humans, they require 1.5x as many steps as two humans collaborating and 1.1x more steps than a single human, underscoring the potential for improvement in these models. We further show that fine-tuning smaller LLMs with planning data can achieve performance on par with models 9 times larger, while being 8.6x faster at inference. Overall, PARTNR highlights significant challenges facing collaborative embodied agents and aims to drive research in this direction.
Written by
Gunjan Chhablani
Roozbeh Mottaghi
Alexander William Clegg
Daniel Tran
Eric Undersander
Ishita Prasad
Jacob Krantz
Jimmy Yang
Joanne Truong
John Turner
Matthew Chang
Michal Hlavac
Mikael Dallaire Cote
Priyam Parashar
Ram Ramrakhya
Siddharth Patki
Vladimir Karashchuk
Xavi Puig
Publisher
ArXiv
Research Topics
Robotics
June 11, 2025
Aaron Foss, Ammar Rizvi, Chloe Evans, Justine T. Kao, Koustuv Sinha, Sasha Mitts
June 11, 2025
June 11, 2025
Mojtaba Komeili, Sarath Chandar, Abha Gejji, Ada Martin, Adrien Bardes, Ammar Rizvi, Artem Zholus, Claire Roberts, Daniel Dugas, David Fan, Francisco Massa, Francois Robert Hogan, Franziska Meier, Kapil Krishnakumar, Koustuv Sinha, Marc Szafraniec, Matthew Muckley, Mido Assran, Michael Rabbat, Nicolas Ballas, Patrick Labatut, Piotr Bojanowski, Quentin Garrido, Russell Howes, Sergio Arnaud, Vasil Khalidov, Xiaodong Ma, Yann LeCun, Yong Li
June 11, 2025
April 17, 2025
Ruslan Partsey, Ayush Jain, Ang Cao, Ishita Prasad, Aravind Rajeswaran, Abha Gejji, Ada Martin, Arjun Majumdar, Daniel Dugas, Franziska Meier, Krishna Murthy Jatavallabhula, Mido Assran, Mikael Henaff, Mike Rabbat, Mrinal Kalakrishnan, Nicolas Ballas, Oleksandr Maksymets, Paul McVay, Phillip Thomas, Alexander Sax, Sergio Arnaud, Vincent-Pierre Berges
April 17, 2025
October 31, 2024
Nolan Black, Romeo Mercado, Norb Tydingco, Gregg Kammerer, Ricardo Chavira, Eric Sanchez, Yitian Ding, Roberto Calandra, Mike Lambeta, Alexander Sohn, Ali Sengül, Byron Taylor, Dave Stroud, Haozhi Qi, Jake Khatha, Jitendra Malik, Kevin Sawyer, Kurt Jenkins, Kyle Most, Neal Stein, Thomas Craven-Bartle, Tingfan Wu, Victoria Rose Most
October 31, 2024

Our approach
Latest news
Foundational models