July 14, 2023
We present CM3Leon (pronounced “Chameleon”), a retrieval-augmented, tokenbased, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but additionally shows the extreme benefits of scaling up and tuning on more diverse instruction-style data. It is the first multi-modal model trained with a recipe adapted from text-only language models, including a large-scale retrieval-augmented pretraining stage and a second multi-task supervised fine-tuning (SFT) stage. It is also a general purpose model that can do both text-to-image and image-to text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs. Extensive experiments demonstrate that this recipe is highly effective for multi-modal models. CM3Leon achieves state-of-theart performance in text-to-image generation with 5x less training compute than comparable methods (zero-shot MS-COCO FID of 4.88). After SFT, CM3Leon can also demonstrate unprecedented levels of controllability in tasks ranging from language-guided image editing to image-controlled generation and segmentation.
Written by
Lili Yu
Bowen Shi
Ram Pasunuru
Benjamin Miller
Tianlu Wang
Arun Babu
Binh Tang
Brian Karrer
Shelly Sheynin
Candace Ross
Russ Howes
Vasu Sharma
Jacob Xu
Uriel Singer
Daniel Li (FAIR)
Gargi Ghosh
Asli Celikyilmaz
Armen Aghajanyan
Publisher
Meta Research website
November 11, 2024
Sherry Xue, Romy Luo, Changan Chen, Kristen Grauman
November 11, 2024
October 31, 2024
Mike Lambeta, Tingfan Wu, Ali Sengül, Victoria Rose Most, Nolan Black, Kevin Sawyer, Romeo Mercado, Haozhi Qi, Alexander Sohn, Byron Taylor, Norb Tydingco, Gregg Kammerer, Dave Stroud, Jake Khatha, Kurt Jenkins, Kyle Most, Neal Stein, Ricardo Chavira, Thomas Craven-Bartle, Eric Sanchez, Yitian Ding, Jitendra Malik, Roberto Calandra
October 31, 2024
October 16, 2024
Movie Gen Team
October 16, 2024
October 04, 2024
Bandhav Veluri, Benjamin Peloquin, Bokai Yu, Hongyu Gong, Shyam Gollakota
October 04, 2024
Foundational models
Latest news
Foundational models