Research

Ranking & Recommendations

ShadowSync: Performing Synchronization in the Background for Highly Scalable Distributed Training

June 26, 2020

Abstract

Ads recommendation systems are often trained with a tremendous amount of data, and distributed training is the workhorse to shorten the training time. Meanwhile, a commonly used technique to prevent overfitting in Ads recommendation is one pass training. In this scenario, the total amount of data is fixed. When we express data parallelism on n workers, each worker only processes 1/n data. The larger the number of workers, the less data each worker observes. While the training throughput can be increased by simply adding more workers, it is also increasingly challenging to preserve the model quality. To address this problem, we propose the ShadowSync framework, in which the model parameters are synchronized across workers, yet we isolate synchronization from training and run it in the background. In contrast to common strategies including synchronous SGD, asynchronous SGD, and model averaging on independently trained sub-models, where synchronization happens in the foreground, ShadowSync synchronization is neither part of the backward pass nor happens every k iterations. ShadowSync is simple but effective. Our framework is generic to host various types of synchronization algorithms, and we propose 3 approaches under this theme. The superiority of ShadowSync is confirmed by experiments on training deep neural networks for click-through-rate prediction. Our methods all succeed in making the training throughput linearly scale with the number of trainers. Comparing to their foreground counterparts, our methods exhibit neutral to better model quality and better scalability when we keep the number of parameter servers the same. In our training system which expresses both replication and Hogwild parallelism, ShadowSync also accomplishes the highest example level parallelism number comparing to the prior arts.

Download the Paper

AUTHORS

Written by

Qinqing Zheng

Bor-Yiing Su

Jiyan Yang

Alisson Azzolini

Qiang Wu

Ou Jin

Shri Karandikar

Hagay Lupesko

Liang Xiong

Eric Zhou

Publisher

ArXiv 2020

Related Publications

November 30, 2020

Theory

Ranking & Recommendations

On ranking via sorting by estimated expected utility

Nicolas Usunier, Clément Calauzènes

November 30, 2020

February 01, 2021

Ranking & Recommendations

Anytime Inference with Distilled Hierarchical Neural Ensembles

Adria Ruiz, Jakob Verbeek

February 01, 2021

November 01, 2018

Ranking & Recommendations

Horizon: Facebook's Open Source Applied Reinforcement Learning Platform | Facebook AI Research

Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Yuchen He, Zachary Kaden, Vivek Narayanan, Xiaohui Ye

November 01, 2018

December 03, 2018

Ranking & Recommendations

NLP

Training with Low-precision Embedding Tables | Facebook AI Research

Jian Zhang, Jiyan Yang, Hector Yuen

December 03, 2018

May 03, 2019

Ranking & Recommendations

Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search | Facebook AI Research

Jinfeng Rao, Wei Yang, Yuhao Zhang, Ferhan Ture, Jimmy Lin

May 03, 2019

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.