April 26, 2022


Cross-device Federated Learning (FL) is a distributed learning paradigm with several challenges that differentiate it from traditional distributed learning, variability in the system characteristics on each device, and millions of clients coordinating with a central server being primary ones. Most FL systems described in the literature are synchronous – they perform a synchronized aggregation of model updates from individual clients. Scaling synchronous FL is challenging since increasing the number of clients training in parallel leads to diminishing returns in training speed, analogous to large-batch training. Moreover, stragglers hinder synchronous FL training. In this work, we outline a production asynchronous FL system design. Our work tackles the aforementioned issues, sketches of some of the system design challenges and their solutions, and touches upon principles that emerged from building a production FL system for millions of clients. Empirically, we demonstrate that asynchronous FL converges faster than synchronous FL when training across nearly one hundred million devices. In particular, in high concurrency settings, asynchronous FL is 5× faster and has nearly 8× less communication overhead than synchronous FL.

Download the Paper


Written by

Dzmitry Huba

John Nguyen

Kshitiz Malik

Ruiyu Zhu

Mike Rabbat

Ashkan Yousefpour

Carole-Jean Wu

Hongyuan Zhan

Pavel Ustinov

Harish Srinivas

Kaikai Wang

Anthony Shoumikhin

Jesik Min

Mani Malek



Research Topics

Systems Research

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.