Products

AI Research

Resources

About

Products

RESPONSIBLE AI

FERRET: Framework for Expansion Reliant Red Teaming

February 13, 2026

Abstract

We introduce a multi-faceted automated red teaming framework in which the goal is to generate multi-modal adversarial conversations that would break a target model and introduce various expansions that would result in more effective and efficient adversarial conversations. The introduced expansions include: 1. Horizontal expansion in which the goal is for the red team model to self-improve and generate more effective conversation starters that would shape a conversation. 2. Vertical expansion in which the goal is to take these conversation starters that are discovered in the horizontal expansion phase and expand them into effective multi-modal conversations and 3. Meta expansion in which the goal is for the red team model to discover more effective multi-modal attack strategies during the course of a conversation. We call our framework FERRET (Framework for Expansion Reliant Red Teaming) and compare it with various existing automated red teaming approaches. In our experiments, we demonstrate the effectiveness of FERRET in generating effective multi-modal adversarial conversations and its superior performance against existing state of the art approaches.

Download the Paper

AUTHORS

Written by

Ninareh Mehrabi

Vítor Albiero

Maya Pavlova

Joanna Bitton

Publisher

arXiv

Research Topics

Conversational AI

Related Publications

June 29, 2026

RESPONSIBLE AI

Accurate Decoding of Natural Sentences from Non-Invasive Brain Recordings

Mingfang (Lucy) Zhang *, Jarod Levy *, Cédric Rommel, Jérémy Rapin, Corentin Bel, Julie Bonnaire, Daniel Nieto, Pierre Bourdillon, Svetlana Pinet, Stéphane d'Ascoli, Thomas Moreau, Jean Remi King

June 29, 2026

Read the Paper

December 26, 2025

REINFORCEMENT LEARNING

NLP

Safety Alignment of LMs via Non-cooperative Games

Anselm Paulus, Ilia Kulikov, Brandon Amos, Remi Munos, Ivan Evtimov, Kamalika Chaudhuri, Arman Zharmagambetov

December 26, 2025

Read the Paper

September 24, 2025

RESEARCH

NLP

Code World Model Preparedness Report

Daniel Song, Peter Ney, Cristina Menghini, Faizan Ahmad, Aidan Boyd, Nathaniel Li, Ziwen Han, Jean-Christophe Testud, Saisuke Okabayashi, Maeve Ryan, Jinpeng Miao, Hamza Kwisaba, Felix Binder, Spencer Whitman, Jim Gust, Esteban Arcaute, Dhaval Kapil, Jacob Kahn, Ayaz Minhas, Tristan Goodman, Lauren Deason, Alexander Vaughan, Shengjia Zhao, Summer Yue