ML APPLICATIONS

DEVELOPER TOOLS

LlamaFirewall: An open source guardrail system for building secure AI agents

April 29, 2025

Abstract

Large language models (LLMs) have evolved from simple chatbots into autonomous agents capable of performing complex tasks such as editing production code, orchestrating workflows, and taking higher-stakes actions based on untrusted inputs like webpages and emails. These capabilities introduce new security risks that existing security measures, such as model fine-tuning or chatbot-focused guardrails, do not fully address. Given the higher stakes and the absence of deterministic solutions to mitigate these risks, there is a critical need for a real-time guardrail monitor to serve as a final layer of defense, and support system level, use case specific safety policy definition and enforcement. We introduce LlamaFirewall, an open-source security focused guardrail framework designed to serve as a final layer of defense against security risks associated with AI Agents. Our framework mitigates risks such as prompt injection, agent misalignment, and insecure code risks through three powerful guardrails: PromptGuard 2, a universal jailbreak detector that demonstrates clear state of the art performance; Agent Alignment Checks, a chain-of-thought auditor that inspects agent reasoning for prompt injection and goal misalignment, which, while still experimental, shows stronger efficacy at preventing indirect injections in general scenarios than previously proposed approaches; and CodeShield, an online static analysis engine that is both fast and extensible, aimed at preventing the generation of insecure or dangerous code by coding agents. Additionally, we include easy-to-use customizable scanners that make it possible for any developer who can write a regular expression or an LLM prompt to quickly update an agent’s security guardrails. LlamaFirewall is utilized in production at Meta. By releasing LlamaFirewall as open source software, we invite the community to leverage its capabilities and collaborate in addressing the new security risks introduced by Agents.

Download the Paper

AUTHORS

Written by

Sahana Chennabasappa

Cyrus Nikolaidis

Daniel Song

Stephanie Ding

Shengye Wan

Rashnil Chaturvedi

James Crnkovich

Beto de Paola

Lauren Deason

Nicholas Doucette

Dominik Gabi

Alekhya Gampa

Kat He

David Molnar

Abraham Montilla

Jean-Christophe Testud

Spencer Whitman

Joshua Saxe

Publisher

arXiv

Research Topics

Systems Research

Related Publications

September 15, 2025

RESEARCH

DEVELOPER TOOLS

CyberSOCEval: Benchmarking LLMs Capabilities for Malware Analysis and Threat Intelligence Reasoning

Lauren Deason, Adam Bali, Ciprian Bejean, Diana Bolocan, James Crnkovich, Ioana Croitoru, Krishna Durai, Chase Midler, Calin Miron, David Molnar, Brad Moon, Bruno Ostarcevic, Alberto Peltea, Matt Rosenberg, Catalin Sandu, Arthur Saputkin, Sagar Shah, Daniel Stan, Ernest Szocs, Shengye Wan, Spencer Whitman, Sven Krasser, Joshua Saxe

September 15, 2025

August 12, 2025

RESEARCH

NLP

Efficient Speculative Decoding for Llama at Scale: Challenges and Solutions

GenAI and Infra Teams

August 12, 2025

August 05, 2025

RESEARCH

CORE MACHINE LEARNING

FastCSP: Accelerated Molecular Crystal Structure Prediction with Universal Model for Atoms

Vahe Gharakhanyan, Yi Yang, Luis Barroso-Luque, Muhammed Shuaibi, Daniel S. Levine, Kyle Michel, Viachaslau Bernat, Misko Dzamba, Xiang Fu, Meng Gao, Xingyu Liu, Keian Noori, Lafe J. Purvis, Tingling Rao, Brandon M. Wood, Ammar Rizvi, Matt Uyttendaele, Andrew J. Ouderkirk, Chiara Daraio, C. Lawrence Zitnick, Arman Boromand, Noa Marom, Zachary W. Ulissi, Anuroop Sriram

August 05, 2025

August 04, 2025

RESEARCH

ML APPLICATIONS

The Open DAC 2025 Dataset for Sorbent Discovery in Direct Air Capture

Anuroop Sriram, Logan M. Brabson, Xiaohan Yu, Sihoon Choi, Kareem Abdelmaqsoud, Elias Moubarak, Pim de Haan, Sindy Löwe, Johann Brehmer, John R. Kitchin, Max Welling, C. Lawrence Zitnick, Zachary Ulissi, Andrew J. Medford, David S. Sholl

August 04, 2025

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.