FEATURED
Open Source

Sharing new open source protection tools and advancements in AI privacy and security

April 29, 2025
6 minute read

Takeaways

  • Today, we’re releasing new Llama protection tools for the open source AI community.
  • We’re providing new AI-enabled solutions to help the defender community proactively detect and secure critical infrastructure, systems, and services against active attacks.
  • We’re also previewing our new technology that allows AI-related requests to be processed privately.

The latest Llama protection tools for the open source community

We’re committed to providing developers with the best possible tools and resources to build secure AI applications. Developers can access our latest Llama Protection tools to be used when building with Llama by visiting Meta’s Llama Protections page, Hugging Face, or GitHub.

  • Llama Guard 4: The new Llama Guard 4 is an update to our customizable Llama Guard tool, acting as a unified safeguard across modalities, supporting protections for text and image understanding. Llama Guard 4 is also available on our new Llama API, which we’re launching as a limited preview.
  • LlamaFirewall: We’re introducing LlamaFirewall, a security guardrail tool to help build secure AI systems. LlamaFirewall can orchestrate across guard models and work with our suite of protection tools to detect and prevent AI system risks such as prompt injection, insecure code, and risky LLM plug-in interactions. For more details on the tool, please refer to the LlamaFirewall research paper.
  • Llama Prompt Guard 2: Prompt Guard 2 86M, an update to our Llama Prompt Guard classifier model, improves on its performance in jailbreak and prompt injection detection. We’re also introducing Prompt Guard 2 22M, a smaller, faster version that can reduce latency and compute costs with minimal performance trade-offs by up to 75% compared to our 86M model.

Helping the defender community leverage AI in security operations

At Meta, we use AI to strengthen our security systems and defend against potential cyber attacks. We’ve heard from the community that they want access to AI-enabled tools that will help them do the same. That’s why we’re sharing updates to help organizations evaluate the efficacy of AI systems in security operations and announcing the Llama Defenders Program for select partners. We believe this is an important effort to improve the robustness of software systems as more capable AI models become available.

  • CyberSec Eval 4: Our updated open source cybersecurity benchmark suite, CyberSecEval 4 includes new tools—CyberSOC Eval and AutoPatchBench—to assess AI systems’ defense capabilities.
    • CyberSOC Eval: Developed with CrowdStrike, this framework measures AI systems’ efficacy in security operation centers. Today, we’re announcing this benchmark, and will be releasing it soon.
    • AutoPatchBench: A new benchmark that evaluates the ability of Llama and other AI systems to automatically patch security vulnerabilities in native code before they can be exploited. Learn more on the Engineering at Meta Blog.
  • Llama Defenders Program: We’re launching the Llama Defenders Program to help partner organizations and developers access a variety of open, early-access, and closed AI solutions to address different security needs.
    • Automated Sensitive Doc Classification Tool: A tool we use internally at Meta, this automatically applies security classification labels to an organization’s internal documents to help prevent unauthorized access and distribution, or to filter out sensitive documents from an AI system’s RAG implementation. Learn more on GitHub.
    • Llama Generated Audio Detector & Llama Audio Watermark Detector: Designed to detect AI-generated content, these tools will help organizations detect AI-generated threats, such as scams, fraud, and phishing attempts. At launch, we’re working with ZenDesk, Bell Canada, and AT&T to integrate these into their systems. If interested in learning more, other organizations can request information by visiting the Llama Defenders Program website.

Building new technology to enable private processing for AI requests

We’re sharing the first look into Private Processing, our new technology that will help WhatsApp users leverage AI capabilities for things like summarizing unread messages or refining them, while keeping messages private so that Meta or WhatsApp cannot access them. More information on our security approach to building this technology, including the threat model that guides how we identify and defend against potential attack vectors, can be found on our Engineering blog. We’re working with the security community to audit and improve our architecture and will continue to build and strengthen Private Processing in the open, in collaboration with researchers, before we launch it in product.

Looking ahead

We hope that the set of AI updates shared here will make it even easier for developers to build with Llama, help organizations enhance their security operations, and enable stronger privacy guarantees for certain AI use cases. We look forward to continuing this work and sharing more in the future.