August 06, 2023
Standard language model training employs gold human documents or human-human interaction data, and treats all training data as positive examples. Growing evidence shows that even with very large amounts of positive training data, issues remain that can be alleviated with relatively small amounts of negative data – examples of what the model should not do. In this work, we propose a novel procedure to train with such data called the CRINGE loss (ContRastive Iterative Negative GEneration). We show the effectiveness of this approach across three different experiments on the tasks of safe generation, contradiction avoidance, and open-domain dialogue. Our models outperform multiple strong baselines and are conceptually simple, easy to train and implement.
Publisher
ACL
June 05, 2026
Anshumali Shrivastava, Jason Chen, Qi Ma, Zeyu Yang
June 05, 2026
May 18, 2026
Alexandre Rezende, Rohit Patel, Steven McClain
May 18, 2026
February 26, 2026
Kaiqu Liang, Xianjun Yang, Shaoliang Nie, Jaime Fernández Fisac, Shuyan Zhou, Julia Kruk, Lijuan Liu, Michael Zhang, Saghar Hosseini, Shengjie Bi, Shengyi Qian
February 26, 2026
December 26, 2025
Brandon Amos, Anselm Paulus, Arman Zharmagambetov, Ilia Kulikov, Ivan Evtimov, Kamalika Chaudhuri, Remi Munos
December 26, 2025

Our approach
Latest news
Foundational models