Computer Vision

A new framework for large-scale training of state-of-the-art visual classification models

December 06, 2019

What it is:

A new end-to-end, PyTorch-based framework for large-scale training of state-of-the-art image and video classification models. It offers several notable advantages:

  • Ease of use. The library features a modular, flexible design that allows anyone to train machine learning models on top of PyTorch using very simple abstractions. The system also has out-of-the-box integration with Amazon Web Services (AWS), facilitating research at scale and making it simple to move between research and production.

  • High performance. Researchers can use the framework to train Resnet50 on ImageNet in as little as 15 minutes, for example.

The framework is now available on GitHub. We’ll also be hosting the “Multi-modal research to production” workshop at the Conference on Neural Information Processing Systems (NeurIPS) in Vancouver on Sunday, December 8.

What it does:

Previous computer vision (CV) libraries have been focused on providing components for users to build their own frameworks for their research. While this approach offers flexibility for researchers, in production settings it leads to duplicative efforts, and it requires users to migrate research between frameworks and to relearn the minutiae of efficient distributed training and data loading. Our PyTorch-based CV framework offers a better solution.

  • Our abstractions make it easy for a project to go from small-scale research prototype to large-scale, best-in-class production jobs with hundreds of GPUs and billions of images.

  • With easy-to-use integration with Torch.Hub, AI researchers and engineers can download and fine-tune the best publicly available ImageNet models with just a few lines of code.

  • We have also added integration with PyTorch Elastic (experimental version), which makes distributed training robust to any transient failures. It also can (optionally) make distributed training jobs adjust to available resources in the cluster while they are running.

At Facebook, we’ve been using this framework in research to more easily train models on the largest datasets with the largest models using state-of-the-art recipes.

Why it matters:

Achieving state-of-the-art results in image and video classification tasks is increasingly dependent upon large-scale training, clusters of many GPUs, and ever-bigger training datasets. By open-sourcing the framework and incorporating out-of-box integration to AWS, we hope to make scalable research more available to the broader community for AI applications and production systems. And by allowing users to use this framework whole or piecemeal, this framework will interoperate well with the extensive set of tools and libraries in the PyTorch ecosystem. We are currently exploring ways to add features and further scale the amount of data that can be used in training.

This framework will help accelerate the pace of research by making it easy to replicate and iterate on state-of-the-art work in large-scale image and video classification, and by enabling anyone to develop their own models at scale using AWS. It also facilitates reproducible research, with features like easy-to-use configuration files and a standard project structure.

Get it on GitHub:



Aaron Adcock

Research Scientist

Vinicius Reis

Research Scientist

Mannat Singh

Software Engineer

Zhicheng Yan

Research Scientist

Kai Zhang

Software Engineer

Simran Motwani

Software Engineer

Jon Guerin

Software Engineer

Naman Goyal

Software Engineer

Laura Gustafson

Software Engineer