July 23, 2024
We are releasing a new suite of security benchmarks for LLMs, CYBERSECEVAL 3, to continue the conversation on empirically measuring LLM cybersecurity risks and capabilities. CYBERSECEVAL 3 assesses 8 different risks across two broad categories: risk to third parties, and risk to application developers and end users. Compared to previous work, we add new areas focused on offensive security capabilities: automated social engineering, scaling manual offensive cyber operations, and autonomous offensive cyber operations. In this paper we discuss applying these benchmarks to the Llama 3 models and a suite of contemporaneous state-of-the-art LLMs, enabling us to contextualize risks both with and without mitigations in place.
Written by
Shengye Wan
Cyrus Nikolaidis
Daniel Song
David Molnar
James Crnkovich
Jayson Grace
Joshua Saxe
Manish Bhatt
Sahana Chennabasappa
Spencer Whitman
Stephanie Ding
Vlad Ionescu
Yue Li
Publisher
arXiv
Research Topics
November 11, 2025
Irene Wang, Mostafa Elhouishi, Divya Mahajan, Bilge Acun, Carole-Jean Wu, Daniel Jiang, Ekin Sumbul, Newsha Ardalani, Samuel Hsia
November 11, 2025
February 28, 2025
Apostolos Kokolis, Adithya Kumar, Carole-Jean Wu, Faye Ma, John Hoffman, Kalyan Saladi, Michael Kuchnik, Parth Malani, Shubho Sengupta, Zachary DeVito
February 28, 2025
December 12, 2024
Mubashara Akhtar, Omar Benjelloun, Costanza Conforti, Luca Foschini, Pieter Gijsbers, Joan Giner-Miguelez, Sujata Goswami, Nitisha Jain, Michalis Karamousadakis, Satyapriya Krishna, Sylvain Lesage, Quentin Lhoest, Pierre Marcenac, Manil Maskey, Peter Mattson, Luis Oala, Hamidah Oderinwale, Pierre Ruyssen, Tim Santos, Rajat Shinde, Elena Simperl, Arjun Suresh, Goeffry Thomas, Slava Tykhonov, Joaquin Vanschoren, Susheel Varma, Jos van der Velde, Steffen Vogler, Luyao Zhang, Michael Kuchnik, Carole-Jean Wu
December 12, 2024
November 20, 2024
Jay Shah, Ganesh Bikshandi, Vijay Thakkar, Pradeep Ramani, Tri Dao, Ying Zhang
November 20, 2024

Our approach
Latest news
Foundational models