COMPUTER VISION

SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model

March 20, 2024

Abstract

We introduce SceneScript, a method that directly produces full scene models as a sequence of structured language commands using an autoregressive, token-based approach. Our proposed scene representation is inspired by recent successes in transformers & LLMs, and departs from more traditional methods which commonly describe scenes as meshes, voxel grids, point clouds or radiance fields. Our method infers the set of structured language commands directly from encoded visual data using a scene language encoder-decoder architecture. To train SceneScript, we generate and release a large-scale synthetic dataset called Aria Synthetic Environments consisting of 100k high-quality indoor scenes, with photorealistic and ground-truth annotated renders of egocentric scene walkthroughs. Our method gives state-of-the art results in architectural layout estimation, and competitive results in 3D object detection. Lastly, we explore an advantage for SceneScript, which is the ability to readily adapt to new commands via simple additions to the structured language, which we illustrate for tasks such as coarse 3D object part reconstruction.

Download the Paper

AUTHORS

Written by

Armen Avetisyan

Chris Xie

Henry Howard-Jenkins

Tsun-Yi Yang

Samir Aroudj

Suvam Patra

Fuyang Zhang

Duncan Frost

Luke Holland

Campbell Orme

Jakob Julian Engel

Edward Miller

Richard Newcombe

Vasileios Balntas

Publisher

arXiv

Research Topics

Computer Vision

Related Publications

July 23, 2024

COMPUTER VISION

Imagine yourself: Tuning-Free Personalized Image Generation

Zecheng He, Bo Sun, Felix Xu, Haoyu Ma, Ankit Ramchandani, Vincent Cheung, Siddharth Shah, Anmol Kalia, Ning Zhang, Peizhao Zhang, Roshan Sumbaly, Peter Vajda, Animesh Sinha

July 23, 2024

July 23, 2024

HUMAN & MACHINE INTELLIGENCE

CONVERSATIONAL AI

The Llama 3 Herd of Models

Llama team

July 23, 2024

July 02, 2024

GRAPHICS

COMPUTER VISION

Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials

Yawar Siddiqui, Tom Monnier, Filippos Kokkinos, Mahendra Kariya, Yanir Kleiman, Emilien Garreau, Oran Gafni, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny

July 02, 2024

July 02, 2024

GRAPHICS

COMPUTER VISION

Meta 3D Gen

Raphael Bensadoun, Tom Monnier, Yanir Kleiman, Filippos Kokkinos, Yawar Siddiqui, Mahendra Kariya, Omri Harosh, Roman Shapovalov, Emilien Garreau, Animesh Karnewar, Ang Cao, Idan Azuri, Iurii Makarov, Eric-Tuan Le, Antoine Toisoul, David Novotny, Oran Gafni, Natalia Neverova, Andrea Vedaldi

July 02, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.