April 25, 2023
Self-supervised learning (SSL), dubbed "the dark matter of intelligence,” is a key ingredient in recent AI breakthroughs.
It has pushed the bounds of deep learning in multiple domains by enabling learning from vast amounts of unlabeled data, rather than relying on carefully annotated datasets. Today it underpins cutting-edge models across modalities in natural language (e.g. translation and large language models), audio (e.g., data2vec), and unlocks flexible new computer vision models (e.g. SEER model trained on one billion images and DINOv2).
But training SSL is like cooking a gourmet meal — it’s an intricate art with a high barrier to entry. While many ingredients may be familiar, a successful SSL recipe involves a dizzying set of choices, from selecting the right pretext tasks to training with carefully curated and seasoned hyper-parameters.
We have released a new "Cookbook of Self-Supervised Learning,” a practical guide for AI researchers and practitioners on how to navigate SSL recipes, understand its various knobs and levers, and gain the know-how needed to experiment with SSL's untapped flavors. This is part of our efforts to lower the barrier and help democratize access to SSL research. You’ll also find tips and tricks from more than a dozen authors across multiple universities, including New York University, University of Maryland, UC Davis, University of Montreal; as well as leading Meta AI researchers, such as Yann LeCun.
Unlike supervised learning, in which the objective is to match inputs to labels, SSL can learn without labels by defining a learning objective based on the underlying structure of data, also known as a pretext task. In natural language, for example, a common SSL objective is to mask a word in the text and predict the surrounding words. This objective encourages the model to capture relationships among words in the text without the need for labels. The same SSL model representations can then be used across a range of downstream tasks, such as translating text across languages, summarizing, or even generating text, among many others. In computer vision, analogous objectives to predict masked patches of an image (MAE: masked autoencoders) or representation (BYOL: bootstrap your own latent). Other SSL objectives encourage two views of the same image, formed by say adding color or cropping, to be mapped to similar representations.
While straightforward in principle, there’s a confluence of factors that lead to SSL’s difficult barrier to entry. First, the computational cost of processing vast volumes of unlabeled data is very high for both training and evaluation. Second, there aren’t many detailed papers showcasing the intricate implementation choices needed to realize SSL’s potential. Third, because SSL establishes a notably distinct paradigm, there’s an absence of a unified vocabulary and theoretical view of SSL. Without a common ground to characterize the different components, it’s challenging for researchers to understand, compare, and develop SSL methods.
Plus, from an implementation perspective, SSL is a fast-paced emerging field with each method taking on its own precisely tuned training recipe. Standard codebases are hard to find and they often use cutting edge, difficult-to-understand optimizations.
Our new paper lays the foundation of SSL and its recipes in a style that’s easy for any researcher to use.
Just as a cook first learns the basic techniques, like chopping and sautéing, researchers can use this cookbook to learn the fundamental techniques and vocabulary of SSL. Specifically, we describe the families of methods along with theoretical threads to connect their objectives in a unified perspective. You’ll find key concepts, such as loss terms or training objects, in easy-to-follow concept boxes.
Researchers can look at common training recipes, including hyper parameter choices, how to assemble components like architectures and optimizers, and how to evaluate SSL methods. You’ll find in one place the key practical considerations to implement SSL methods successfully.
There are still numerous open research questions in SSL, including generalization guarantees, fairness properties, and robustness to adversarial attacks or even naturally occurring variations. The research community needs to better understand how seemingly different yet overlapping methods can produce state-of-the-art results, and more generally advance theoretical understanding of SSL and best practices for real-world deployment.
New researchers are needed to help tackle these open questions and continue to push the field forward. We hope that our SSL cookbook will help make this possible.
We’d like to acknowledge the contributions of: Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian, Avi Schwarzschild, Andrew Gordon Wilson, Jonas Geiping, Quentin Garrido, Pierre Fernandez, Amir Bar, Hamed Pirsiavash, Yann LeCun and Micah Goldblum