July 6, 2020
Recent advances in pre-trained multilingual language models lead to state-of-the-art results on the task of quality estimation (QE) for machine translation. A carefully engineered ensemble of such models dominated the QE shared task at WMT 2019. Our in-depth analysis, however, shows that the success of using pre-trained language models for QE is overestimated due to three issues we observed in current QE datasets: (i) The distributions of quality scores are imbalanced and skewed towards good quality scores; (ii) QE models can perform well on these datasets without even ingesting source or translated sentences; (iii) They contain statistical artifacts that correlate well with human-annotated QE labels. Our findings suggest that though QE models might capture fluency of translated sentences and complexity of source sentences, they cannot model adequacy of translations effectively.
Publisher
Association for Computational Linguistics (ACL)
Research Topics
November 16, 2022
Kushal Tirumala, Aram H. Markosyan, Armen Aghajanyan, Luke Zettlemoyer
November 16, 2022
October 31, 2022
Fabio Petroni, Giuseppe Ottaviano, Michele Bevilacqua, Patrick Lewis, Scott Yih, Sebastian Riedel
October 31, 2022
December 06, 2020
Michael Lewis, Armen Aghajanyan, Gargi Ghosh, Luke Zettlemoyer, Marjan Ghazvininejad, Sida Wang
December 06, 2020
November 30, 2020
Dhruv Batra, Devi Parikh, Meera Hahn, Jacob Krantz, James Rehg, Peter Anderson, Stefan Lee
November 30, 2020
April 30, 2018
Yedid Hoshen, Lior Wolf
April 30, 2018
November 01, 2018
Yedid Hoshen, Lior Wolf
November 01, 2018
December 02, 2018
Sagie Benaim, Lior Wolf
December 02, 2018
June 30, 2019
Geng Ji, Dehua Cheng, Huazhong Ning, Changhe Yuan, Hanning Zhou, Liang Xiong, Erik B. Sudderth
June 30, 2019
Foundational models
Latest news
Foundational models