Research

NLP

Training with Low-precision Embedding Tables

December 3, 2018

Abstract

Starting from the success of Glove and Word2Vec in natural language processing, continuous representations are widely deployed in many other domain of applications. These applications span over encoding textual information to modeling user and items in recommender systems, using embedding vectors to represent a large number of objects. As the cardinality of the object sets increases, the embedding components quickly become the bottleneck in training memory footprint. In this work, we focus on building a system to train continuous embeddings in low precision floating point representation. Specifically, our system performs SGD-style model updates in single precision arithmetics, casts the updated parameters using stochastic rounding and stores the parameters in half-precision floating point. Theoretically, we prove that for strongly convex objectives, our SGD-based training algorithm retains the same convergence rate up to constants. We also present a system-friendly implementation for faster random number generator that increases runtime performance by 30%. We deploy our training system to deep neural networks with low precision embedding tables for recommender systems on top of both public dataset Criteo and an internal dataset at Facebook. We empirically demonstrate that our half-precision floating point training system can achieve generalization performance matching that of single precision training system, with up to 2X memory saving and 1.2X faster training speed.

Download the Paper

Related Publications

February 06, 2025

NLP

Brain-to-Text Decoding: A Non-invasive Approach via Typing

Jarod Levy, Mingfang (Lucy) Zhang, Svetlana Pinet, Jérémy Rapin, Hubert Jacob Banville, Stéphane d'Ascoli, Jean Remi King

February 06, 2025

February 06, 2025

NLP

From Thought to Action: How a Hierarchy of Neural Dynamics Supports Language Production

Mingfang (Lucy) Zhang, Jarod Levy, Stéphane d'Ascoli, Jérémy Rapin, F.-Xavier Alario, Pierre Bourdillon, Svetlana Pinet, Jean Remi King

February 06, 2025

November 16, 2022

NLP

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Kushal Tirumala, Aram H. Markosyan, Armen Aghajanyan, Luke Zettlemoyer

November 16, 2022

October 31, 2022

NLP

Autoregressive Search Engines: Generating Substrings as Document Identifiers

Fabio Petroni, Giuseppe Ottaviano, Michele Bevilacqua, Patrick Lewis, Scott Yih, Sebastian Riedel

October 31, 2022

November 01, 2018

NLP

Computer Vision

Non-Adversarial Unsupervised Word Translation | Facebook AI Research

Yedid Hoshen, Lior Wolf

November 01, 2018

December 02, 2018

NLP

Computer Vision

One-Shot Unsupervised Cross Domain Translation | Facebook AI Research

Sagie Benaim, Lior Wolf

December 02, 2018

June 30, 2019

NLP

Variational Training for Large-Scale Noisy-OR Bayesian Networks | Facebook AI Research

Geng Ji, Dehua Cheng, Huazhong Ning, Changhe Yuan, Hanning Zhou, Liang Xiong, Erik B. Sudderth

June 30, 2019

June 26, 2020

NLP

Computer Vision

ShadowSync: Performing Synchronization in the Background for Highly Scalable Distributed Training

Qinqing Zheng, Bor-Yiing Su, Jiyan Yang, Alisson Azzolini, Qiang Wu, Ou Jin, Shri Karandikar, Hagay Lupesko, Liang Xiong, Eric Zhou

June 26, 2020

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.