ROBOTICS

RESEARCH

Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

April 17, 2025

Abstract

We present Locate 3D, a model for localizing objects in 3D scenes from referring expressions like “the small coffee table between the sofa and the lamp.” Locate 3D sets a new state-of-the-art on standard referential grounding benchmarks and showcases robust generalization capabilities. Notably, Locate 3D operates directly on sensor observation streams (posed RGB-D frames), enabling real-world deployment on robots and AR devices. Key to our approach is 3D-JEPA, a novel self-supervised learning (SSL) algorithm applicable to sensor point clouds. It takes as input a 3D pointcloud featurized using 2D foundation models (CLIP, DINO). Subsequently, masked prediction in latent space is employed as a pretext task to aid the self-supervised learning of contextualized pointcloud features. Once trained, the 3D-JEPA encoder is finetuned alongside a language-conditioned decoder to jointly predict 3D masks and bounding boxes. Additionally, we introduce Locate 3D Dataset, a new dataset for 3D referential grounding, spanning multiple capture setups with over 130K annotations. This enables a systematic study of generalization capabilities as well as a stronger model.

Download the Paper

AUTHORS

Written by

Paul McVay

Sergio Arnaud

Ada Martin

Arjun Majumdar

Krishna Murthy Jatavallabhula

Phillip Thomas

Ruslan Partsey

Daniel Dugas

Abha Gejji

Alexander Sax

Vincent-Pierre Berges

Mikael Henaff

Ayush Jain

Ang Cao

Ishita Prasad

Mrinal Kalakrishnan

Mike Rabbat

Nicolas Ballas

Mido Assran

Oleksandr Maksymets

Aravind Rajeswaran

Franziska Meier

Publisher

arXiv

Research Topics

Robotics

Computer Vision

Related Publications

May 14, 2025

RESEARCH

CORE MACHINE LEARNING

UMA: A Family of Universal Models for Atoms

Brandon M. Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso-Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R. Kitchin, Daniel S. Levine, Kyle Michel, Anuroop Sriram, Taco Cohen, Abhishek Das, Ammar Rizvi, Sushree Jagriti Sahoo, Zachary W. Ulissi, C. Lawrence Zitnick

May 14, 2025

May 13, 2025

HUMAN & MACHINE INTELLIGENCE

RESEARCH

Dynadiff: Single-stage Decoding of Images from Continuously Evolving fMRI

Marlène Careil, Yohann Benchetrit, Jean-Rémi King

May 13, 2025

April 25, 2025

RESEARCH

NLP

ReasonIR: Training Retrievers for Reasoning Tasks

Rulin Shao, Qiao Rui, Varsha Kishore, Niklas Muennighoff, Victoria Lin, Daniela Rus, Bryan Kian Hsiang Low, Sewon Min, Scott Yih, Pang Wei Koh, Luke Zettlemoyer

April 25, 2025

April 17, 2025

HUMAN & MACHINE INTELLIGENCE

CONVERSATIONAL AI

Collaborative Reasoner: Self-improving Social Agents with Synthetic Conversations

Ansong Ni, Ruta Desai, Yang Li, Xinjie Lei, Dong Wang, Ramya Raghavendra, Gargi Ghosh, Daniel Li (FAIR), Asli Celikyilmaz

April 17, 2025

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.