Developer Tools
Announcing Purple Llama: Towards open trust and safety in the new world of generative AI
December 7, 2023
3 minute read
  • We’re announcing Purple Llama, an umbrella project featuring open trust and safety tools and evaluations meant to level the playing field for developers to responsibly deploy generative AI models and experiences in accordance with best practices shared in our Responsible Use Guide.
  • As a first step, we are releasing CyberSec Eval, a set of cybersecurity safety evaluations benchmarks for LLMs; and Llama Guard, a safety classifier for input/output filtering that is optimized for ease of deployment.
  • Aligned with our open approach we look forward to partnering with the newly announced AI Alliance, AMD, AWS, Google Cloud, Hugging Face, IBM, Intel, Lightning AI, Microsoft, MLCommons, NVIDIA, Scale AI, and many others to improve and make those tools available to the open source community.

Generative AI has brought about a new wave of innovation unlike we’ve ever seen before. With it, we have the ability to converse with conversational AIs, generate realistic imagery, and accurately summarize large corpora of documents, all from simple prompts. With over 100 million downloads of Llama models to date, a lot of this innovation is being fueled by open models.

Collaboration on safety will build trust in the developers driving this new wave of innovation, and requires additional research and contributions on responsible AI. The people building AI systems can’t address the challenges of AI in a vacuum, which is why we want to level the playing field and create a center of mass for open trust and safety.

Today, we are announcing the launch of Purple Llama — an umbrella project that, over time, will bring together tools and evaluations to help the community build responsibly with open generative AI models. The initial release will include tools and evaluations for cybersecurity and input/output safeguards, with more tools to come in the near future.

Components within the Purple Llama project will be licensed permissively, enabling both research and commercial usage. We believe this is a major step towards enabling community collaboration and standardizing the development and usage of trust and safety tools for generative AI development.

The first step forward

Cybersecurity and LLM prompt safety are important areas for generative AI safety today. We have prioritized these considerations in our first party products and highlighted them as best practice in the Llama 2 Responsible Use Guide.


We are sharing what we believe is the first industry-wide set of cybersecurity safety evaluations for LLMs. These benchmarks are based on industry guidance and standards (e.g., CWE and MITRE ATT&CK) and built in collaboration with our security subject matter experts. With this initial release, we aim to provide tools that will help address some risks outlined in the White House commitments on developing responsible AI, including:

  • Metrics for quantifying LLM cybersecurity risks.
  • Tools to evaluate the frequency of insecure code suggestions.
  • Tools to evaluate LLMs to make it harder to generate malicious code or aid in carrying out cyberattacks.

We believe these tools will reduce the frequency of LLMs suggesting insecure AI-generated code and reduce their helpfulness to cyber adversaries. Our initial results show that there are meaningful cybersecurity risks for LLMs, both with recommending insecure code and for complying with malicious requests. See our Cybersec Eval paper for more details.

Input/Output Safeguards

As we outlined in Llama 2’s Responsible Use Guide, we recommend that all inputs and outputs to the LLM be checked and filtered in accordance with content guidelines appropriate to the application.

To support this, and empower the community, we are releasing Llama Guard, an openly-available model that performs competitively on common open benchmarks and provides developers with a pretrained model to help defend against generating potentially risky outputs.

As part of our ongoing commitment to open and transparent science, we are releasing our methodology and an extended discussion of model performance in our Llama Guard paper. This model has been trained on a mix of publicly-available datasets to enable detection of common types of potentially risky or violating content that may be relevant to a number of developer use cases. Ultimately, our vision is to enable developers to customize this model to support relevant use cases and to make it easier to adopt best practices and improve the open ecosystem.

Why purple?

We believe that to truly mitigate the challenges that generative AI presents we need to take both attack (red team) and defensive (blue team) postures. Purple teaming, composed of both red and blue team responsibilities, is a collaborative approach to evaluating and mitigating potential risks. The same ethos applies to generative AI. Hence, our investment in Purple Llama will be comprehensive.

An open ecosystem

Taking an open approach to AI is not new for Meta. Exploratory research, open science, and cross-collaboration are foundational to our AI efforts, and we believe there’s an important opportunity to create an open ecosystem. This collaborative mindset was at the forefront when Llama 2 launched in July with over 100 partners, and we’re excited to share that many of those same partners will be partnering with us on open trust and safety, including: AI Alliance, AMD, Anyscale, AWS, Bain, Cloudflare, Databricks, Dell Technologies, Dropbox, Google Cloud, Hugging Face, IBM, Intel, Microsoft, MLCommons, Nvidia, Oracle, Orange, Scale AI, Together.AI, and many more to come.

We’ve also worked with our partners at Papers With Code and HELM to incorporate these evals into their benchmarks, alongside our collaborators within the MLCommons AI Safety Working Group.

We’re excited to collaborate with each and every one of our partners as well as others who share the same vision of an open ecosystem of responsibly-developed generative AI.

The path forward

We are hosting a workshop at NeurIPS 2023, where we plan to share these tools and provide a technical deep dive to help people get started. We hope you’ll join us. We expect safety guidelines and best practices to be an ongoing conversation in the field, and we want your input. We are excited to continue the conversation, find ways to partner, and learn more about what areas matter to you.

Dive deeper and learn more


Our latest updates delivered to your inbox

Subscribe to our newsletter to keep up with Meta AI news, events, research breakthroughs, and more.

Join us in the pursuit of what’s possible with AI.

Related Posts
Computer Vision
Introducing Segment Anything: Working toward the first foundation model for image segmentation
April 5, 2023
MultiRay: Optimizing efficiency for large-scale AI models
November 18, 2022
ML Applications
MuAViC: The first audio-video speech translation benchmark
March 8, 2023