RESPONSIBLE AI

Detecting Benchmark Detection Through Watermarking

February 24, 2025

Abstract

Benchmark contamination poses a significant challenge to the reliability of Large Language Models (LLMs) evaluations, as it is difficult to assert whether a model has been trained on a test set. We introduce a solution to this problem by watermarking benchmarks before their release. The embedding involves reformulating the original questions with a watermarked LLM, in a way that does not alter the benchmark utility. During evaluation, we can detect "radioactivity", i.e., traces that the text watermarks leave in the model during training, using a theoretically grounded statistical test. We test our method by pre-training 1B models from scratch on 10B tokens with controlled benchmark contamination, and validate its effectiveness in detecting contamination on ARC-Easy, ARC-Challenge, and MMLU. Results show similar benchmark utility post-watermarking and successful contamination detection when models are contaminated enough to enhance performance, e.g. p-val =10−3 for +5% on ARC-Easy.

Download the Paper

AUTHORS

Written by

Alain Durmus

Chuan Guo

Pierre Fernandez

Saeed Mahloujifar

Tom Sander

Publisher

arXiv

Related Publications

February 13, 2026

RESPONSIBLE AI

FERRET: Framework for Expansion Reliant Red Teaming

Joanna Bitton, Maya Pavlova, Ninareh Mehrabi, Vítor Albiero

February 13, 2026

December 26, 2025

REINFORCEMENT LEARNING

NLP

Safety Alignment of LMs via Non-cooperative Games

Brandon Amos, Anselm Paulus, Arman Zharmagambetov, Ilia Kulikov, Ivan Evtimov, Kamalika Chaudhuri, Remi Munos

December 26, 2025

September 24, 2025

RESEARCH

NLP

Code World Model Preparedness Report

Aidan Boyd, Alexander Vaughan, Ayaz Minhas, Cristina Menghini, Daniel Song, Dhaval Kapil, Esteban Arcaute, Faizan Ahmad, Felix Binder, Hamza Kwisaba, Jacob Kahn, Jean-Christophe Testud, Jim Gust, Jinpeng Miao, Lauren Deason, Maeve Ryan, Nathaniel Li, Peter Ney, Saisuke Okabayashi, Shengjia Zhao, Spencer Whitman, Summer Yue, Tristan Goodman, Ziwen Han

September 24, 2025

June 13, 2025

FAIRNESS

INTEGRITY

Measuring multi-calibration

Nastaran Okati, Daniel Haimovich, Fridolin Linder, Ido Guy, Lorenzo Perini, Mark Tygert, Niek Tax

June 13, 2025

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.