Takeaways
Meta’s Fundamental AI Research (FAIR) team is focused on achieving advanced machine intelligence (AMI) and using it to power products and innovation for the benefit of everyone. For more than a decade, we’ve been sharing cutting-edge research and collaborating with the global AI community. Our mission to achieve AMI is an extension of this commitment to innovating for the greater good. As Mark Zuckerberg noted in a recent open letter, open source AI “has more potential than any other modern technology to increase human productivity, creativity, and quality of life,” all while accelerating economic growth and advancing groundbreaking medical and scientific research.
Today, we’re excited to share some of our most recent research and models that support our goal of achieving AMI, while also supporting open science and reproducibility. These artifacts focus on the building blocks of AMI, including perception, speech and language, reasoning, embodiment, and alignment. Today’s release includes a variety of work from FAIR spanning these areas, including SAM 2.1, an update to our popular Segment Anything Model 2, refinement of sentence representations for LLMs, and more. Each work includes open source artifacts the community can use to further their own research and build on areas of interest.
By publicly sharing our early research work, we hope to inspire iterations and ultimately help advance AI in a responsible way. Maintaining an open science approach and sharing our work with the community helps us stay true to our goal of building AI systems that work well for everyone and bring the world closer together. We can’t wait to see what people build with these latest releases and continue the important conversations we’re having with the open source community.
Introducing Meta Segment Anything Model 2.1
In the 11 weeks since we shared SAM 2 with the open source community, the model has been downloaded more than 700,000 times, and the web demo has been used to segment hundreds of thousands of objects in images and videos. We’ve been blown away by the response and impact SAM 2 is already making across disciplines, including research with medical images, meteorology, and more. We’ve also been listening to feedback about how we can make SAM 2 even better.
Today, we’re sharing Meta Segment Anything Model 2.1 (SAM 2.1), an updated checkpoint with stronger performance.
We introduced additional data augmentation techniques to simulate the presence of visually similar objects and small objects where SAM 2 previously struggled and also improved SAM 2’s occlusion handling capability by training the model on longer sequences of frames and making some tweaks to positional encoding of spatial and object pointer memory. Further details and results are provided in our updated paper.
We’re also sharing the SAM 2 Developer Suite, a package of open source code to make it easier than ever to build with SAM 2. This release includes training code for fine-tuning SAM 2 with your own data, and for the first time, we’re sharing the front-end and back-end code for our web demo.
Meta Spirit LM: An open source language model for seamless speech and text integration
Large language models are frequently used to build text-to-speech pipelines, wherein speech is transcribed by automatic speech recognition (ASR), then synthesized by an LLM to generate text, which is ultimately converted to speech using text-to-speech (TTS). However, this process compromises the expressive aspects of the speech being understood and generated. In an effort to address this limitation, we built Meta Spirit LM, our first open source multimodal language model that freely mixes text and speech.
Meta Spirit LM is trained with a word-level interleaving method on speech and text datasets to enable cross-modality generation. We developed two versions of Spirit LM to display both the generative semantic abilities of text models and the expressive abilities of speech models. Spirit LM Base uses phonetic tokens to model speech, while Spirit LM Expressive uses pitch and style tokens to capture information about tone, such as whether it’s excitement, anger, or surprise, and then generates speech that reflects that tone
Spirit LM lets people generate more natural sounding speech, and it has the ability to learn new tasks across modalities such as automatic speech recognition, text-to-speech, and speech classification. We hope our work will inspire the larger research community to continue to develop speech and text integration.
Layer Skip: Enhancing large language model performance with accelerated generation times
Large language models have been widely adopted across various industries and use cases—however their high computational and memory requirements expend a lot of energy and can carry high financial costs. To address these challenges, we introduce Layer Skip, an end-to-end solution that accelerates LLM generation times on new data without relying on specialized hardware or software.
Layer Skip accelerates LLMs by executing a subset of its layers and utilizing subsequent layers for verification and correction.
We’re releasing the inference code and fine-tuned checkpoints for Layer Skip, including Llama 3, Llama 2, and Code Llama. These models have been optimized with the Layer Skip training recipe, significantly improving the accuracy of early layer exits. Additionally, we're sharing Layer Skip's inference implementation, which can boost model performance by up to 1.7x.
What sets these Layer Skip checkpoints apart is their robustness to exiting at earlier layers and to skipping intermediate layers, as well as the uniformity of activations across layers. These unique features pave the way for innovative research in optimization and interpretability. We’re excited to see how the research community leverages these tools to push the boundaries of what's possible with AI.
Salsa: Validating security for post-quantum cryptography standards
Research in cryptography, the science of securing information, must stay ahead of attacks in order to protect people’s data. Today, we’re sharing new code that will enable researchers to benchmark AI-based attacks and compare them to new and existing attacks going forward.
The industry standard adopted by the National Institute of Standards and Technology (NIST), lattice-based cryptography, is based on a difficult problem called learning with errors (LWE). This assumes that it is hard to learn a secret vector, given only noisy inner products with random vectors. Previously, we demonstrated the first machine learning attacks on this method. Our current state-of-the-art method, Salsa, is capable of attacking sparse secrets in the NIST standard, Krystals Kyber. Currently, Salsa can break sparse secrets, but further progress may lead to attacks on general secrets. We’re continuing to explore AI methods and expand our research to find other potential weaknesses in cryptography that could potentially be exploited by machine learning.
By sharing this work, we hope the community will build on our research to help assure the future security of deployed cryptographic systems. We will continue to engage with the research community to accelerate work on validating security for post-quantum cryptography (PQC) standards, which are the foundational building blocks for the future of secure systems.
Meta Lingua: Accelerating research with efficient model training
Meta Lingua is a lightweight and self-contained codebase designed to train language models at scale. This work enables a research-friendly environment that makes it easier to translate concepts into practical experiments and prioritizes simplicity and reusability to accelerate research. The efficient and customizable platform also allows researchers to quickly test their ideas with minimal setup and technical hassle.
To achieve this, we made several design choices to ensure the code is both modular and self-contained while remaining efficient. We leverage multiple features in PyTorch that allow us to maintain flexibility and performance while making the code easier to install and maintain. By sharing the code today, we hope Lingua will enable researchers to focus on the important work they’re doing while letting the platform take care of efficient model training and reproducible research.
Meta Open Materials 2024: Facilitating inorganic materials discovery with open source data and models
Discovering new materials to drive technological advancements can take decades. AI-assisted materials discovery could revolutionize this field and greatly accelerate the discovery pipeline. Today, we’re releasing the Meta Open Materials 2024 dataset and models, which are at the top of the Matbench-Discovery leaderboard and could enable further breakthroughs in AI-accelerated materials discovery through open and reproducible research.
Today’s best materials discovery models are closed models built upon foundational research from the open source AI community. Meta Open Materials 2024 provides open source models and data based on 100 million training examples—one of the largest open datasets—providing a competitive open source option for the materials discovery and AI research community.
Meta Open Materials 2024 is now openly available and will empower the AI and material science research communities to accelerate discovery of inorganic materials and bridge the gap between open and proprietary models in the field of materials discovery.
Mexma: Token-level objectives for improved sentence representations
We’re sharing a research paper and codebase for Mexma, our novel pre-trained cross-lingual sentence encoder. Mexma outperforms previous methods by combining token- and sentence-level objectives during training. We show that previous approaches to train cross-lingual sentence encoders were only updating their encoders via a sentence representation. We improved on this by also using token-level objectives to better update the encoder.
We hope the research community will benefit from using Mexma as a sentence encoder. Mexma covers 80 languages, has sentence representations aligned across all languages, and works well on other downstream tasks, such as sentence classification.
Self-Taught Evaluator: Strong generative reward model with synthetic data
Finally, we’ve released Self-Taught Evaluator, a new method for generating synthetic preference data to train reward models without relying on human annotations. This approach generates contrasting model outputs and trains an LLM-as-a-Judge to produce reasoning traces for evaluation and final judgments, with an iterative self-improvement scheme.
We released the model trained with direct preference optimization, which is a strong generative reward model on RewardBench, despite not using any human annotation in training data creation. It outperforms bigger models or using human-annotated labels, e.g. GPT-4, Llama-3.1-405B-Instruct, and Gemini-Pro. The model is also available as an evaluator on the AlpacaEval leaderboard, as one of the top-ranked evaluators in terms of human agreement rate while being around 7x to 10x faster than the default GPT-4 evaluator. Since its release, the AI community has embraced our synthetic data approach and used it to train top-performing reward models. We’re excited to see further exploration and advancements in this field using synthetic data.
Our latest updates delivered to your inbox
Subscribe to our newsletter to keep up with Meta AI news, events, research breakthroughs, and more.
Join us in the pursuit of what’s possible with AI.
Foundational models
Latest news
Foundational models