April 29, 2025
Large language models (LLMs) have evolved from simple chatbots into autonomous agents capable of performing complex tasks such as editing production code, orchestrating workflows, and taking higher-stakes actions based on untrusted inputs like webpages and emails. These capabilities introduce new security risks that existing security measures, such as model fine-tuning or chatbot-focused guardrails, do not fully address. Given the higher stakes and the absence of deterministic solutions to mitigate these risks, there is a critical need for a real-time guardrail monitor to serve as a final layer of defense, and support system level, use case specific safety policy definition and enforcement. We introduce LlamaFirewall, an open-source security focused guardrail framework designed to serve as a final layer of defense against security risks associated with AI Agents. Our framework mitigates risks such as prompt injection, agent misalignment, and insecure code risks through three powerful guardrails: PromptGuard 2, a universal jailbreak detector that demonstrates clear state of the art performance; Agent Alignment Checks, a chain-of-thought auditor that inspects agent reasoning for prompt injection and goal misalignment, which, while still experimental, shows stronger efficacy at preventing indirect injections in general scenarios than previously proposed approaches; and CodeShield, an online static analysis engine that is both fast and extensible, aimed at preventing the generation of insecure or dangerous code by coding agents. Additionally, we include easy-to-use customizable scanners that make it possible for any developer who can write a regular expression or an LLM prompt to quickly update an agent’s security guardrails. LlamaFirewall is utilized in production at Meta. By releasing LlamaFirewall as open source software, we invite the community to leverage its capabilities and collaborate in addressing the new security risks introduced by Agents.
Written by
Abraham Montilla
Alekhya Gampa
Beto de Paola
Cyrus Nikolaidis
Daniel Song
David Molnar
Dominik Gabi
James Crnkovich
Jean-Christophe Testud
Joshua Saxe
Kat He
Lauren Deason
Nicholas Doucette
Rashnil Chaturvedi
Sahana Chennabasappa
Shengye Wan
Spencer Whitman
Stephanie Ding
Publisher
arXiv
Research Topics
April 14, 2026
Zijian Zhou, Bohao Tang, Pengfei Liu, Fei Zhang, Frost Xu, Hang Li (BizAI), Semih Gunel, Sen He, Soubhik Sanyal, Tao Xiang, Viktar Atliha, Zhe Wang
April 14, 2026
September 15, 2025
Adam Bali, Ciprian Bejean, Diana Bolocan, Ioana Croitoru, Chase Midler, Calin Miron, Brad Moon, Bruno Ostarcevic, Alberto Peltea, Matt Rosenberg, Catalin Sandu, Sagar Shah, Daniel Stan, Ernest Szocs, Sven Krasser, Arthur Saputkin, David Molnar, James Crnkovich, Joshua Saxe, Krishna Durai, Lauren Deason, Shengye Wan, Spencer Whitman
September 15, 2025
August 12, 2025
GenAI and Infra Teams
August 12, 2025
August 05, 2025
Yi Yang, Xiang Fu, Matt Uyttendaele, Andrew J. Ouderkirk, Noa Marom, Xingyu Liu, Ammar Rizvi, Anuroop Sriram, Arman Boromand, Brandon M. Wood, Chiara Daraio, Daniel S. Levine, Keian Noori, Kyle Michel, Lafe J. Purvis, C. Lawrence Zitnick, Luis Barroso-Luque, Misko Dzamba, Muhammed Shuaibi, Meng Gao, Tingling Rao, Vahe Gharakhanyan, Viachaslau Bernat, Zachary W. Ulissi
August 05, 2025

Our approach
Latest news
Foundational models