NLP

Efficiently Upgrading Multilingual Machine Translation Models to Support More Languages

June 10, 2023

Abstract

With multilingual machine translation (MMT) models continuing to grow in size and number of supported languages, it is natural to reuse and upgrade existing models to save computation as data becomes available in more languages. However, adding new languages requires updating the vocabulary, which complicates the reuse of embeddings. The question of how to reuse existing models while also making architectural changes to provide capacity for both old and new languages has also not been closely studied. In this work, we introduce three techniques that help speed up effective learning of the new languages and alleviate catastrophic forgetting despite vocabulary and architecture mismatches. Our results show that by (1) carefully initializing the network, (2) applying learning rate scaling, and (3) performing data up-sampling, it is possible to exceed the performance of a same-sized baseline model with 30% computation and recover the performance of a larger model trained from scratch with over 50% reduction in computation. Furthermore, our analysis reveals that the introduced techniques help learn the new directions more effectively and alleviate catastrophic forgetting at the same time. We hope our work will guide research into more efficient approaches to growing languages for these MMT models and ultimately maximize the reuse of existing models.

Download the Paper

AUTHORS

Written by

Simeng Sun

Maha Elbayad

Anna Sun

James Cross

Publisher

EACL

Related Publications

November 20, 2024

NLP

CORE MACHINE LEARNING

Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations

Igor Fedorov, Kate Plawiak, Lemeng Wu, Tarek Elgamal, Naveen Suda, Eric Smith, Hongyuan Zhan, Jianfeng Chi, Yuriy Hulovatyy, Kimish Patel, Zechun Liu, Yangyang Shi, Tijmen Blankevoort, Mahesh Pasupuleti, Bilge Soran, Zacharie Delpierre Coudert, Rachad Alao, Raghuraman Krishnamoorthi, Vikas Chandra

November 20, 2024

November 19, 2024

NLP

Adaptive Decoding via Latent Preference Optimization

Shehzaad Dhuliawala, Ilia Kulikov, Ping Yu, Asli Celikyilmaz, Jason Weston, Sainbayar Sukhbaatar, Jack Lanchantin

November 19, 2024

November 14, 2024

NLP

CORE MACHINE LEARNING

A Survey on Deep Learning for Theorem Proving

Zhaoyu Li, Jialiang Sun, Logan Murphy, Qidong Su, Zenan Li, Xian Zhang, Kaiyu Yang, Xujie Si

November 14, 2024

October 04, 2024

HUMAN & MACHINE INTELLIGENCE

CONVERSATIONAL AI

Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents

Bandhav Veluri, Benjamin Peloquin, Bokai Yu, Hongyu Gong, Shyam Gollakota

October 04, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.