July 14, 2023
We present CM3Leon (pronounced “Chameleon”), a retrieval-augmented, tokenbased, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows the extreme benefits of scaling up and tuning on more diverse instruction-style data. It is the first multi-modal model trained with a recipe adapted from text-only language models, including a large-scale retrieval-augmented pretraining stage and a second multi-task supervised fine-tuning (SFT) stage. It is also a general purpose model that can do both text-to-image and image-to text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs. Extensive experiments demonstrate that this recipe is highly effective for multi-modal models. CM3Leon achieves state-of-theart performance in text-to-image generation with 5x less training compute than comparable methods (zero-shot MS-COCO FID of 4.88). After SFT, CM3Leon can also demonstrate unprecedented levels of controllability in tasks ranging from language-guided image editing to image-controlled generation and segmentation.
Written by
Lili Yu
Bowen Shi
Ram Pasunuru
Benjamin Miller
Tianlu Wang
Arun Babu
Binh Tang
Brian Karrer
Shelly Sheynin
Russ Howes
Vasu Sharma
Jacob Xu
Uriel Singer
Gargi Ghosh
Asli Celikyilmaz
Armen Aghajanyan
Publisher
Meta Research website
June 05, 2026
Zeyu Yang, Qi Ma, Jason Chen, Anshumali Shrivastava
June 05, 2026
May 26, 2026
Josephine Raugel, Max Seitzer, Marc Szafraniec, Huy V. Vo, Jérémy Rapin, Patrick Labatut, Piotr Bojanowski, Valentin Wyart, Jean Remi King
May 26, 2026
May 20, 2026
Dongyan Lin, Phillip Rust, Angel Villar Corrales, Alvin W. M. Tan, Mahi Luthra, Charles-Eric Saint-James, Rashel Moritz, Sheila Krogh-Jespersen, Vanessa Stark, Surya Parimi, Jiayi Shen, Youssef Benchekroun, Yosuke Higuchi, Martin Gleize, Tom Fizycki, Nicolas Hamilakis, Manel Khentout, Sho Tsuji, Balázs Kégl, Juan Pino, Michael C. Frank, Emmanuel Dupoux
May 20, 2026
May 18, 2026
Rohit Patel, Alexandre Rezende, Steven McClain
May 18, 2026

Our approach
Latest news
Foundational models