ROBOTICS

PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks

October 31, 2024

Abstract

We present a benchmark for Planning And Reasoning Tasks in humaN-Robot collaboration (PARTNR) designed to study human-robot coordination in household activities. PARTNR tasks exhibit characteristics of everyday tasks, such as spatial, temporal, and heterogeneous agent capability constraints. We employ a semi-automated task generation pipeline using Large Language Models (LLMs), incorporating simulation in the loop for grounding and verification. PARTNR stands as the largest benchmark of its kind, comprising 100,000 natural language tasks, spanning 60 houses and 5,819 unique objects. We analyze state-of-the-art LLMs on PARTNR tasks, across the axes of planning, perception and skill execution. The analysis reveals significant limitations in SoTA models, such as poor coordination and failures in task tracking and recovery from errors. When LLMs are paired with real humans, they require 1.5x as many steps as two humans collaborating and 1.1x more steps than a single human, underscoring the potential for improvement in these models. We further show that fine-tuning smaller LLMs with planning data can achieve performance on par with models 9 times larger, while being 8.6x faster at inference. Overall, PARTNR highlights significant challenges facing collaborative embodied agents and aims to drive research in this direction.

Download the Paper

AUTHORS

Written by

Gunjan Chhablani

Roozbeh Mottaghi

Akshara Rai

Alexander William Clegg

Daniel Tran

Eric Undersander

Ishita Prasad

Jacob Krantz

Jimmy Yang

Joanne Truong

John Turner

Matthew Chang

Michal Hlavac

Mikael Dallaire Cote

Priyam Parashar

Ram Ramrakhya

Ruta Desai

Siddharth Patki

Vladimir Karashchuk

Xavi Puig

Publisher

ArXiv

Research Topics

Robotics

Related Publications

June 11, 2025

ROBOTICS

COMPUTER VISION

CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models

Aaron Foss, Ammar Rizvi, Chloe Evans, Justine T. Kao, Koustuv Sinha, Sasha Mitts

June 11, 2025

June 11, 2025

ROBOTICS

RESEARCH

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Mojtaba Komeili, Sarath Chandar, Abha Gejji, Ada Martin, Adrien Bardes, Ammar Rizvi, Artem Zholus, Claire Roberts, Daniel Dugas, David Fan, Francisco Massa, Francois Robert Hogan, Franziska Meier, Kapil Krishnakumar, Koustuv Sinha, Marc Szafraniec, Matthew Muckley, Mido Assran, Michael Rabbat, Nicolas Ballas, Patrick Labatut, Piotr Bojanowski, Quentin Garrido, Russell Howes, Sergio Arnaud, Vasil Khalidov, Xiaodong Ma, Yann LeCun, Yong Li

June 11, 2025

April 17, 2025

ROBOTICS

RESEARCH

Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

Ruslan Partsey, Ayush Jain, Ang Cao, Ishita Prasad, Aravind Rajeswaran, Abha Gejji, Ada Martin, Arjun Majumdar, Daniel Dugas, Franziska Meier, Krishna Murthy Jatavallabhula, Mido Assran, Mikael Henaff, Mike Rabbat, Mrinal Kalakrishnan, Nicolas Ballas, Oleksandr Maksymets, Paul McVay, Phillip Thomas, Alexander Sax, Sergio Arnaud, Vincent-Pierre Berges

April 17, 2025

October 31, 2024

HUMAN & MACHINE INTELLIGENCE

ROBOTICS

Digitizing Touch with an Artificial Multimodal Fingertip

Nolan Black, Romeo Mercado, Norb Tydingco, Gregg Kammerer, Ricardo Chavira, Eric Sanchez, Yitian Ding, Roberto Calandra, Mike Lambeta, Alexander Sohn, Ali Sengül, Byron Taylor, Dave Stroud, Haozhi Qi, Jake Khatha, Jitendra Malik, Kevin Sawyer, Kurt Jenkins, Kyle Most, Neal Stein, Thomas Craven-Bartle, Tingfan Wu, Victoria Rose Most

October 31, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.