AD-Drop: Attribution Driven Dropout for Robust Language Model Finetuning

October 31, 2022


Fine-tuning large pre-trained language models on downstream tasks is apt to suffer from overfitting when limited training data is available. While dropout proves to be an effective antidote by randomly dropping a proportion of units, existing research has not examined its effect on the self-attention mechanism. In this paper, we investigate this problem through self-attention attribution and find that dropping attention positions with low attribution scores can accelerate training and increase the risk of overfitting. Motivated by this observation, we propose Attribution-Driven Dropout (\textsc{AD-Drop}), which randomly discards some high-attribution positions to encourage the model to make predictions by relying more on low-attribution positions to reduce overfitting. We also develop a cross-tuning strategy to alternate fine-tuning and \textsc{AD-Drop} to avoid dropping high-attribution positions excessively. Extensive experiments on various benchmarks show that \textsc{AD-Drop} yields consistent improvements over baselines.~Analysis further confirms that \textsc{AD-Drop} serves as a strategic regularizer to prevent overfitting during fine-tuning.

Download the Paper


Written by

Qifan Wang

Shaoliang Nie

Jinghao Deng

Tao Yang

Xiaojun Quan



Research Topics

Natural Language Processing (NLP)

Core Machine Learning

Related Publications

June 14, 2024


How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Scott Yih, Xilun Chen

June 14, 2024

June 14, 2024



LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, Basil Hosmer, Bram Wasti, Liangzhen Lai, Nas Mahmoud, Bilge Acun, Saurabh Agarwal, Ahmed Roman, Ahmed Aly, Beidi Chen, Carole-Jean Wu

June 14, 2024

June 13, 2024


Know When To Stop: A Study of Semantic Drift in Text Generation

Ava Spataru, Eric Hambro, Lena Voita, Nicola Cancedda

June 13, 2024

May 24, 2024



DOC-RAG: ASR Language Model Personalization with Domain-Distributed Co-occurrence Retrieval Augmentation

Zhe Liu

May 24, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.