April 17, 2025
We present Locate 3D, a model for localizing objects in 3D scenes from referring expressions like “the small coffee table between the sofa and the lamp.” Locate 3D sets a new state-of-the-art on standard referential grounding benchmarks and showcases robust generalization capabilities. Notably, Locate 3D operates directly on sensor observation streams (posed RGB-D frames), enabling real-world deployment on robots and AR devices. Key to our approach is 3D-JEPA, a novel self-supervised learning (SSL) algorithm applicable to sensor point clouds. It takes as input a 3D pointcloud featurized using 2D foundation models (CLIP, DINO). Subsequently, masked prediction in latent space is employed as a pretext task to aid the self-supervised learning of contextualized pointcloud features. Once trained, the 3D-JEPA encoder is finetuned alongside a language-conditioned decoder to jointly predict 3D masks and bounding boxes. Additionally, we introduce Locate 3D Dataset, a new dataset for 3D referential grounding, spanning multiple capture setups with over 130K annotations. This enables a systematic study of generalization capabilities as well as a stronger model.
Written by
Ruslan Partsey
Ayush Jain
Ang Cao
Ishita Prasad
Aravind Rajeswaran
Abha Gejji
Ada Martin
Arjun Majumdar
Daniel Dugas
Franziska Meier
Krishna Murthy Jatavallabhula
Mikael Henaff
Mrinal Kalakrishnan
Nicolas Ballas
Oleksandr Maksymets
Paul McVay
Phillip Thomas
Alexander Sax
Sergio Arnaud
Vincent-Pierre Berges
Publisher
arXiv
June 05, 2026
Anshumali Shrivastava, Jason Chen, Qi Ma, Zeyu Yang
June 05, 2026
May 26, 2026
Valentin Wyart, Huy V. Vo, Jean Remi King, Josephine Raugel, Jérémy Rapin, Marc Szafraniec, Max Seitzer, Patrick Labatut, Piotr Bojanowski
May 26, 2026
May 20, 2026
Alvin W. M. Tan, Nicolas Hamilakis, Manel Khentout, Sho Tsuji, Balázs Kégl, Michael C. Frank, Angel Villar Corrales, Charles-Eric Saint-James, Dongyan Lin, Emmanuel Dupoux, Jiayi Shen, Juan Pino, Mahi Luthra, Martin Gleize, Phillip Rust, Rashel Moritz, Sheila Krogh-Jespersen, Surya Parimi, Tom Fizycki, Vanessa Stark, Yosuke Higuchi, Youssef Benchekroun
May 20, 2026
May 18, 2026
Alexandre Rezende, Rohit Patel, Steven McClain
May 18, 2026

Our approach
Latest news
Foundational models