Responsible AI
Our responsible approach to Meta AI and Meta Llama 3
April 18, 2024

Takeaways


  • We’ve taken responsible steps before launching Meta AI and Meta Llama 3 so people can have safer and more enjoyable experiences.
  • We’re supporting the open source developer ecosystem by providing tools and resources for developers as they build with Llama 3.
  • We’re working with a global set of partners to create industry-wide standards that benefit the entire open source community.

Today, we released our new Meta AI, one of the world’s leading free AI assistants built with Meta Llama 3, the next generation of our publicly available, state-of-the-art large language models. Thanks to our latest advances with Llama 3, Meta AI is smarter, faster, and more fun than ever before.

We are committed to developing AI responsibly and helping others do the same. That’s why we’re taking a series of steps so people can have enjoyable experiences when using these features and models, and sharing resources and tools to support developers and the open community.

Responsibility at different layers of the development process

We’re excited about the potential that generative AI technology can have for people who use Meta products, and for the broader ecosystem. We also want to make sure we’re developing and releasing this technology in a way that anticipates and works to reduce risk. To do this, we take steps to evaluate and address risks at each level of the AI development and deployment process. This includes incorporating protections in the process we use to design and release the Llama base model, supporting the developer ecosystem so they can build responsibly, and adopting the same best practices we expect of other developers when we develop and release our own generative AI features on Facebook, Instagram, WhatsApp, and Messenger.

As we explained when we released Llama 2, it’s important to be intentional in designing these mitigations because there are some measures that can only be effectively implemented by the model provider, and others that only work effectively when implemented by the developer as part of their specific application.

For these reasons, with Llama we take a system-centric approach that applies protections at every layer of the development stack. This includes taking a thoughtful approach to our training and tuning efforts and providing tools that make it easy for developers to implement models responsibly. In addition to maximizing the effectiveness of our responsible AI efforts, this approach aligns with our open innovation approach by giving developers more power to customize their products so they’re safer and benefit their users. The Responsible Use Guide is an important resource for developers that outlines considerations they should take to build their own products, which is why we followed its main steps when building Meta AI.




Responsibly building Llama 3 as a foundation model

We took several steps at the model level to develop a highly-capable and safe foundation model in Llama 3, including:

1. Addressing risks in training
The foundation of any model is the training process, through which the model learns both the language and information that it needs to operate. As a result, our approach started with a series of responsible AI mitigations in our training process. For example:

  • We expanded the training dataset for Llama 3 so it’s seven times larger than what we used for Llama 2, and it includes four times more code. Over 5% of the Llama 3 pre-training dataset consists of high-quality non-English data that covers over 30 languages. While the models we’re releasing today are only fine tuned for English outputs, the increased data diversity helps the models better recognize nuances and patterns, and perform strongly across a variety of tasks.
  • We found that previous generations of Llama are good at identifying high-quality data, so we used Llama 2 to help build the text-quality classifiers that are powering Llama 3. We also leveraged synthetic data to train in areas such as coding, reasoning, and long context. For example, we used synthetic data to create longer documents to train on.
  • As with Llama 2, Llama 3 is trained on a variety of public data. For training, we followed Meta’s standard privacy review processes. We excluded or removed data from certain sources known to contain a high volume of personal information about private individuals.

2. Safety evaluations and tuning

We adapted the pretrained model through a process called fine-tuning where we take additional steps to improve its performance in understanding and generating text conversations so it can be used for assistant-like chat applications.

During and after training, we conducted both automated and manual evaluations to understand our models’ performance in a series of risk areas like weapons, cyber attacks, and child exploitation. In each area, we performed additional work to limit the chance the model provides unwanted responses in these areas.

  • For example, we conducted extensive red teaming exercises with external and internal experts to stress test the models to find unexpected ways they might be used.
  • We also evaluated Llama 3 with benchmark tests like CyberSecEval, Meta’s publicly available cybersecurity safety evaluation suite that measures how likely a model is to help carry out a cyber attack.
  • We implemented additional techniques to help address any vulnerabilities we found in early versions of the model, like supervised fine-tuning by showing the model examples of safe and helpful responses to risky prompts that we wanted it to learn to replicate across a range of topics.
  • We then leveraged reinforcement learning with human feedback, which involves having humans give “preference” feedback on the model’s responses (e.g., rating which response is better and safer).
  • This is an iterative process, so we repeated testing after taking the steps above to gauge how effective those new measures were at reducing risks and address any remaining ones.

3. Lowering benign refusals

We’ve heard feedback from developers that Llama 2 would sometimes inadvertently refuse to answer innocuous prompts. Large language models tend to over-generalize, and we don’t intend for it to refuse to answer prompts like “How do I kill a computer program?” even though we don’t want it to respond to prompts like “How do I kill my neighbor?”

  • We improved our fine tuning approach so Llama 3 is significantly less likely to falsely refuse to answer prompts than Llama 2. As part of this, we used high-quality data to show the models examples of responses with these small language nuances, so we could train them to recognize these nuances.
  • As a result, Llama 3 is our most helpful model to date and offers new capabilities, including improved reasoning.

4. Model transparency

As with Llama 2, we’re publishing a model card that includes detailed information on Llama 3’s model architecture, parameters, and pretrained evaluations. The model card also provides information about the capabilities and limitations of the models.

  • We’ve expanded the information in the Llama 3 model card so it includes additional details about our responsibility and safety approach.
  • It also includes results for Llama 3 models on standard automatic benchmarks like general knowledge, reasoning, math problem solving, coding, and reading comprehension.

Over the coming months, we’ll release additional Llama 3 models with new capabilities including multimodality, the ability to converse in multiple languages, and stronger overall capabilities. Our general approach of open sourcing our Llama 3 models is something we remain committed to. We’re currently training a 400B parameter model—and any final decision on when, whether, and how to open source will be taken following safety evaluations we will be running in the coming months.

How we built Meta AI as a responsible developer

We built the new Meta AI on top of Llama 3, just as we envision that Llama 3 will empower developers to expand the existing ecosystem of Llama-based products and services. As we describe in our Responsible Use Guide, we took additional steps at the different stages of product development and deployment to build Meta AI on top of the foundation model, just as any developer would use Llama 3 to build their own product.

In addition to the mitigations that we adopted within Llama 3, a developer needs to adopt additional mitigations to ensure the model can operate properly in the context of their specific AI system and use case. For Meta AI, the use case is a safe, helpful assistant available to people for free directly in our apps. We designed it to help people get things done like brainstorming and overcoming writer’s block, or connecting with friends to discover new places and adventures.

Since the launch of Meta AI last year, we’ve consistently updated and improved the experience and we’re continuing to make it even better. For example:

1. We improved Meta AI’s responses to peoples’ prompts and questions.

  • For example, we wanted to refine the way Meta AI answers prompts about political or social issues, so we’re incorporating guidelines specific to those topics. If someone asks about a debated policy issue, our goal is that Meta AI won’t offer a single opinion or point of view, but will instead summarize relevant points of view about the topic. If someone asks specifically about one side of an issue, we generally want to respect that person’s intent and have Meta AI answer the specific question.
  • Addressing viewpoint bias in generative AI systems is a new area of research. We continue to make progress toward reinforcing this approach for Meta AI’s responses but as we’re seeing with all generative AI systems, it may not always return the response we intend. We’re also exploring additional techniques that can address it along with user feedback.

2. We taught the Meta AI model specific instructions and responses to make it a more helpful AI assistant.

  • This includes several fine-tuning steps, like developing reward models for safety and helpfulness that give the models a reward if it does what we intend.
  • People send prompts to the model and categorize the responses in accordance with our guidelines.
  • The examples that aligned with the tone and responsiveness we wanted Meta AI to emulate were then fed back into the Meta AI model, which “rewards” it when it generates similar content. This process continues to train the model to produce more content within the guidelines.

3. We evaluated Meta AI’s performance against benchmarks and using human experts.

  • Just like we did for Llama 3, we reviewed Meta AI models with external and internal experts through red teaming exercises to find unexpected ways that Meta AI might be used, then addressed those issues in an iterative process.
  • We’re also stress-testing Meta AI capabilities across our apps to make sure it’s working as intended in places like feed, chats, search, and more.
  • We ran a battery of adversarial evaluations—both automated and reviewed by humans—as a comprehensive system-level review to see how Meta AI scored on key safety metrics.

4. We applied safeguards at the prompt and response level.

  • To encourage Meta AI to share helpful and safer responses that are in line with its guidelines, we implement filters on both the prompts that users submit and on responses after they’re generated by the model, but before they’re shown to a user.
  • These filters rely on systems known as classifiers that work to detect a prompt or response that falls into its guidelines. For example, if someone asks how to steal money from a boss, the classifier will detect that prompt and the model is trained to respond that it can’t provide guidance on breaking the law.
  • We have also leveraged large language models specifically built for the purpose of helping to catch safety violations.

5. We’ve built feedback tools within Meta AI.

  • Feedback is instrumental to the development of any generative AI feature since no AI model is perfect, so people can share directly with us whether they received a good or bad response and we’ll use this feedback to improve Meta AI and the models.
  • This feedback is reviewed to determine if responses are helpful or if they went against the guidelines and instructions we developed.
  • The results are used in ongoing model training to improve Meta AI’s performance over time.

Transparency is critical to help people understand this new technology and become comfortable with it. When someone interacts with Meta AI, we tell them it’s AI technology so they can choose whether they want to continue using it. We share information within the features themselves to help people understand that AI might return inaccurate or inappropriate outputs, which is the same for all generative AI systems. In chats with Meta AI, people can access additional information about how it generates content, the limitations of AI, and how the data they have shared with Meta AI is used.

We also include visible markers on photorealistic images generated by Meta AI so people know the content was created with AI. In May, we will begin labeling video, audio, and image content that people post on our apps as “Made with AI” when we detect industry standard AI image indicators or when people disclose that they’re uploading AI-generated content.

How developers can build responsibly with Llama 3

Meta AI is just one of many features and products that will be built with Llama 3, and we’re releasing different models in 8B and 70B sizes so developers can use the best version for them. We’re providing an instruction-tuned model that is specialized for chatbot applications and a pretrained model for developers with specific use cases that would benefit from custom policies.

In addition to the Responsible Use Guide, we’re providing open source tools that make it even easier for developers to customize Llama 3 and deploy generative AI-powered experiences.

  • We’re releasing updated components for Llama Guard 2, which is a state-of-the-art safeguard model that developers can use as an extra layer to reduce the likelihood their model will generate outputs that aren’t aligned with their intended guidelines. This is based on the recently announced classification from MLCommons.
  • We’ve updated CyberSecEval, which is designed to help developers evaluate any cybersecurity risks with code generated by LLMs. We used this to evaluate Llama 3 and address issues prior to releasing it.
  • We are introducing Code Shield, which developers can use to reduce the chance of generating potentially insecure code. Our teams have already used Code Shield with Meta’s internal coding LLM to prevent tens of thousands of potentially insecure suggestions this year.
  • We have a comprehensive getting started guide that helps developers with information and resources to navigate the development and deployment process.
  • We’ve shared Llama Recipes, which contains our open source code to make it easier for developers to build with Llama through tasks like organizing and preparing their dataset, fine-tuning to teach the model to perform their specific use case, setting up safety measures to identify and handle potentially harmful or inappropriate content generated by the model through RAG systems, and deploying the model and evaluating its performance to see if it’s working as intended.
  • We also receive direct feedback from open source developers and researchers through open source repositories like GitHub and our long-running bug bounty program, which informs updates to our features and models.

Meta’s open approach to supporting the ecosystem

For more than a decade, Meta has been at the forefront of responsible open source in AI, and we believe that an open approach to AI leads to better, safer products, faster innovation, and a larger market. We’ve seen people using Llama 2 in new and innovative ways since it was released in July 2023—like Yale's Meditron LLM that’s helping medical professionals with decision-making and the Mayo Clinic’s tool that helps radiologists create clinically accurate summaries of their patients’ scans. Llama 3 has the potential to make these tools and experiences even better.

“The upcoming improvements in the reasoning capabilities of Llama 3 are important to any application, but especially in the medical domain, where trust depends quite a lot on the transparency of the decision-making process. Breaking down a decision/prediction into a set of logical steps is often how humans explain their actions and this kind of interpretability is expected from clinical decision support tools. Llama 2 not only enabled us to make Meditron, it also set a precedent for the potential impact of open-source foundation models in general. We are excited about Llama 3 for the example it sets in industry on the social value of open models.” —Prof Mary-Anne Hartley (Ph.D. MD, MPH), Director of the Laboratory for Intelligent Global Health and Humanitarian Response Technologies based jointly at Yale School of Medicine and EPFL School of Computer Science

Open source software is typically safer and more secure due to ongoing feedback, scrutiny, development, and mitigations from the community. Deploying AI safely is a shared responsibility of everyone in the ecosystem, which is why we’ve collaborated for many years with organizations that are working to build safe and trustworthy AI. For example, we’ve been working with MLCommons and a global set of partners to create responsibility benchmarks in ways that benefit the entire open source community. We co-founded the AI Alliance, a coalition of companies, academics, advocates, and governments working to develop tools that enable an open and safe AI ecosystem. We also recently released the findings from a Community Forum in partnership with Stanford and the Behavioral Insights Team so companies, researchers, and governments can make decisions based on input from people around the world about what’s important to them when it comes to generative AI chatbots.

We are collaborating with governments around the world to create a solid foundation for AI advancements to be secure, fair, and reliable. We eagerly await the progress on safety evaluation and research from national safety institutes including those in the United States and United Kingdom, particularly as they focus on establishing standardized threat models and evaluations throughout the AI development process. This will help measure risks quantitatively and consistently so risk thresholds can be set. The results of these efforts will guide companies like Meta in measuring and addressing risks, and deciding how and whether to release models.

As technologies continue to evolve, we look forward to improving these features and models in the months and years to come. And we look forward to helping people build, create, and connect in new and exciting ways.



Share:

Our latest updates delivered to your inbox

Subscribe to our newsletter to keep up with Meta AI news, events, research breakthroughs, and more.

Join us in the pursuit of what’s possible with AI.

Related Posts
Computer Vision
Introducing Segment Anything: Working toward the first foundation model for image segmentation
April 5, 2023
FEATURED
Research
MultiRay: Optimizing efficiency for large-scale AI models
November 18, 2022
FEATURED
ML Applications
MuAViC: The first audio-video speech translation benchmark
March 8, 2023