PyTorch Distributed: Experiences on Accelerating Data Parallel Training

August 31, 2020


This paper presents the design, implementation, and evaluation of the PyTorch distributed data parallel module. PyTorch is a widely-adopted scientific computing package used in deep learning research and applications. Recent advances in deep learning argue for the value of large datasets and large models, which necessitates the ability to scale out model training to more computational resources. Data parallelism has emerged as a popular solution for distributed training thanks to its straightforward principle and broad applicability. In general, the technique of distributed data parallelism replicates the model on every computational resource to generate gradients independently and then communicates those gradients at each iteration to keep model replicas consistent. Despite the conceptual simplicity of the technique, the subtle dependencies between computation and communication make it non-trivial to optimize the distributed training efficiency. As of v1.5, PyTorch natively provides several techniques to accelerate distributed data parallel, including bucketing gradients, overlapping computation with communication, and skipping gradient synchronization. Evaluations show that, when configured appropriately, the PyTorch distributed data parallel module attains near-linear scalability using 256 GPUs.

Download the Paper


Written by

Shen Li

Brian Vaughan

Jeff Smith (FRL)

Omkar Salpekar

Pritam Damania

Rohan Varma

Soumith Chintala

Teng Li

Yanli Zhao

Adam Paszke

Pieter Noordhuis


VLDB-Industrial Track

Research Topics

Systems Research

Related Publications

November 07, 2023



The Framework Tax: Disparities Between Inference Efficiency in NLP Research and Deployment

Jared Fernandez, Jacob Kahn, Clara Na, Yonatan Bisk, Emma Strubell

November 07, 2023

August 21, 2023


GraphAGILE: An FPGA-Based Overlay Accelerator for Low-Latency GNN Inference

Bingyi Zhang, Hanqing Zeng, Viktor Prasanna

August 21, 2023

June 19, 2023


MODeL: Memory Optimizations for Deep Learning

Benoit Steiner, Mostafa Elhoushi, Jacob Kahn, James Hegarty

June 19, 2023

April 26, 2023



Green Federated Learning

Ashkan Yousefpour, Shen Guo, Ashish Shenoy, Sayan Ghosh, Pierre Stock, Kiwan Maeng, Schalk Krüger, Mike Rabbat, Carole-Jean Wu, Ilya Mironov

April 26, 2023

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.