RESEARCH

NLP

Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations

July 29, 2019

Abstract

Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings. However, naive training for zero-shot NMT easily fails, and is sensitive to hyper-parameter setting. The performance typically lags far behind the more conventional pivot-based approach which translates twice using a third language as a pivot. In this work, we address the degeneracy problem due to capturing spurious correlations by quantitatively analyzing the mutual information between language IDs of the source and decoded sentences. Inspired by this analysis, we propose to use two simple but effective approaches: (1) decoder pre-training; (2) back-translation. These methods show significant improvement (4~22 BLEU points) over the vanilla zero-shot translation on three challenging multilingual datasets, and achieve similar or better results than the pivot-based approach.

Download the Paper

AUTHORS

Written by

Jiatao Gu

Kyunghyun Cho

Victor O.K. Li

Yong Wang

Publisher

ACL

Related Publications

July 23, 2024

HUMAN & MACHINE INTELLIGENCE

CONVERSATIONAL AI

The Llama 3 Herd of Models

Llama team

July 23, 2024

June 25, 2024

NLP

Neurons in Large Language Models: Dead, N-gram, Positional

Elena Voita, Javier Ferrando Monsonis, Christoforos Nalmpantis

June 25, 2024

June 25, 2024

SPEECH & AUDIO

NLP

Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation

Min-Jae Hwang, Ilia Kulikov, Benjamin Peloquin, Hongyu Gong, Peng-Jen Chen, Ann Lee

June 25, 2024

June 14, 2024

NLP

How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Scott Yih, Xilun Chen

June 14, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.