December 07, 2023
This paper presents CYBERSECEVAL, a comprehensive benchmark developed to help bolster the cybersecurity of Large Language Models (LLMs) employed as coding assistants. As what we believe to be the most extensive unified cybersecurity safety benchmark to date, CYBERSECEVAL provides a thorough evaluation of LLMs in two crucial security domains: their propensity to generate insecure code and their level of compliance when asked to assist in cyberattacks. Through a case study involving seven models from the Llama2, codeLlama, and OpenAI GPT large language model families, CYBERSECEVAL effectively pinpointed key cybersecurity risks. More importantly, it offered practical insights for refining these models. A significant observation from the study was the tendency of more advanced models to suggest insecure code, highlighting the critical need for integrating security considerations in the development of sophisticated LLMs. CYBERSECEVAL, with its automated test case generation and evaluation pipeline covers a broad scope and equips LLM designers and researchers with a tool to broadly measure and enhance the cybersecurity safety properties of LLMs, contributing to the development of more secure AI systems.
Written by
GenAI Cybersec Team
Spencer Whitman
Aleksandar Straumann
Cornelius Aschermann
Cyrus Nikolaidis
Daniel Song
David LeBlanc
Dhaval Kapil
Dominik Gabi
Faizan Ahmad
Ivan Evtimov
James Milazzo
Joshua Saxe
Lorenzo Fontana
Manish Bhatt
Ravi Prakash Giri
Sahana Chennabasappa
Sasha Frolov
Shengye Wan
Varun Vontimitta
Yiannis Kozyrakis
Publisher
arXiv
February 13, 2026
Joanna Bitton, Maya Pavlova, Ninareh Mehrabi, VĂtor Albiero
February 13, 2026
December 26, 2025
Brandon Amos, Anselm Paulus, Arman Zharmagambetov, Ilia Kulikov, Ivan Evtimov, Kamalika Chaudhuri, Remi Munos
December 26, 2025
September 24, 2025
Aidan Boyd, Alexander Vaughan, Ayaz Minhas, Cristina Menghini, Daniel Song, Dhaval Kapil, Esteban Arcaute, Faizan Ahmad, Felix Binder, Hamza Kwisaba, Jacob Kahn, Jean-Christophe Testud, Jim Gust, Jinpeng Miao, Lauren Deason, Maeve Ryan, Nathaniel Li, Peter Ney, Saisuke Okabayashi, Shengjia Zhao, Spencer Whitman, Summer Yue, Tristan Goodman, Ziwen Han
September 24, 2025
June 13, 2025
Nastaran Okati, Daniel Haimovich, Fridolin Linder, Ido Guy, Lorenzo Perini, Mark Tygert, Niek Tax
June 13, 2025

Our approach
Latest news
Foundational models