June 2, 2021
Facebook’s AI models perform trillions of inference operations every day for the billions of people that use our technologies. Meeting this growing workload demand means we have to continually evolve our AI frameworks. Which is why, today we’re announcing that we’re migrating all our AI systems to PyTorch.
Adopting PyTorch as Facebook’s default AI framework helps ensure that all the experiences across our technologies will run optimally at Facebook scale and for everyone, regardless of what device, operating system, or quality of internet connection they may have. Whether it’s advancing the state of the art in computer vision or deploying personalized Instagram recommendations, we’re able to innovate at the fastest clip with greater flexibility, thanks to PyTorch.
Today, over a year into the migration process, there are more than 1,700 PyTorch-based inference models in full production at Facebook, and 93 percent of our new training models — those responsible for identifying and analyzing content on Facebook — are on PyTorch.
This migration also means Facebook can work alongside the vibrant PyTorch community more closely than ever. PyTorch not only makes our research and engineering work more effective, collaborative, and efficient, but also allows us to share our work as open source PyTorch libraries and learn from the advances made by the thousands of PyTorch developers around the world.
Historically, AI’s research-to-production pipeline has been tedious and complicated. Multiple steps and tools, fragmented processes, and lack of any clear standardization across the AI industry made it nearly impossible to manage the end-to-end workflow. AI researchers and engineers were forced to choose between AI frameworks that were optimized for either research or production.
Facebook was no different and faced the same dilemmas as everyone else in this regard. So, in 2016, a group of AI researchers at Facebook set out in collaboration with the AI research community to tackle these challenges head-on. To get a better understanding of what was already available and what was needed, they experimented with machine learning (ML) frameworks such as Theano and Torch, as well as advanced concepts from Lua Torch, Chainer, and HIPS Autograd. After months of development, PyTorch was born.
The core engineering team maintaining PyTorch created one of the fastest-growing AI frameworks available by never losing sight of one key factor: usability. The early versions of PyTorch were a hit with Facebook’s researchers as well as with the open source community, and PyTorch quickly became the go-to deep learning library of choice for AI researchers.
Researchers loved PyTorch’s simple interface, dynamic computational graphs, back-end support for CPUs and GPUs, and first-class Python integration. PyTorch provided a constraint-free environment where they could truly express and iterate on their ideas.
But it was the release of PyTorch 1.0 in early 2018 that began the work to unify PyTorch’s research and production capabilities into a single framework.
This new iteration merged Python-based PyTorch with production-ready Caffe2 and fused together immediate and graph execution modes, providing both flexibility for research and performance optimization for production. PyTorch engineers at Facebook introduced a family of tools, libraries, pretrained models, and data sets for each stage of development, enabling the developer community to quickly create and deploy new AI innovations at scale.
The platform continues to evolve to this day, with the most recent release boasting more than 3,000 commits since the prior version. And more and more key contributions are coming from enterprise collaborators and the wider open source community.
Through all this, Facebook has continued to serve as a core maintainer and community shepherd of PyTorch. Now Facebook can work alongside the PyTorch community by bringing the framework, which has already made an impact across our products and services, to its fullest potential within the company.
The goal of our PyTorch migration is to create a smoother end-to-end developer experience for our engineers and developers. We want to accelerate our research-to-production pipeline by using a single platform that allows us the flexibility to experiment coupled with the ability to launch AI models at production scale.
By moving away from Caffe2 and standardizing in PyTorch, we’re decreasing the infrastructure and engineering burden associated with maintaining two entire systems, as well as unifying under one common umbrella — both internally and within the open source community.
This is an ongoing journey and spans product teams across Facebook. There is no one-size-fits-all approach for how teams have approached the migration. There is a huge variety of models, such as ranking, computer vision, natural language processing, and translation, to name a few — all varying in size and complexity. And as we migrate our AI workloads, we also need to maintain steady model performance and limit disruption to any downstream product traffic or research progress.
On a daily average, there are over 4,000 models at Facebook running on PyTorch.
Facebook’s developers go through multiple stages before their model is considered fully migrated to PyTorch, including critical offline and online testing, training (and often retraining), inference, and then publishing. There are also multiple tests that are conducted to check for performance and correctness variance between Caffe2 and PyTorch, which can take engineers up to a few weeks to fully perform.
To address all these different migration scenarios, our engineers developed an internal workflow and custom tools to help the various teams decide the best way to migrate, or whether their system should be migrated at all, rather than replaced.
Latency, for example, was a concern for many teams who wanted to know whether moving their models over to PyTorch would diminish their performance. We created internal benchmarking tools to compare the performance of original models with that of their PyTorch counterparts ahead of time to make these evaluations easier.
With measures like these in place, each team was able to decide on the right timeline and the best course of action for its own migration.
With PyTorch as the underlying platform powering all Facebook’s AI workloads, our engineers can deploy new AI models in minutes rather than in weeks, build vastly more powerful and efficient systems that power new experiences, and much more.
With PyTorch as our common AI framework at Facebook:
Consider an ML model for natural language processing (NLP), for example. NLP relies on models that have been pretrained using enormous data sets containing hundreds of languages. Once the models have been trained, they can be fine-tuned for NLP tasks such as classification (e.g.,identifying hate speech), question answering (in AI voice assistants), or name recognition (as in Facebook’s “how to pronounce your name” feature). But doing all this can require a great deal of time and resources.
PyTorch has enhanced our AI models in many ways, from decreasing their energy/power consumption to making them more efficient and powerful. By shrinking the gap between ML research and applied ML, PyTorch has allowed our engineers to take a “model builder” approach to AI development. Rather than adapting ready-made, off-the-shelf solutions, our engineers now develop their own ML solutions specifically tailored to address given challenges — moving seamlessly from researching and testing a solution to implementing it.
Now models for applications like NLP not only can be tailor-made, but also are more accurate, less time-consuming to develop, and less resource-intensive to deploy.
With PyTorch as the underlying framework, our engineers and developers no longer have to go through the tedious process of reimplementing models each time they are updated to verify that their performance has stayed consistent. In addition, the models themselves benefit from improved accuracy and speed and reduced latency, meaning they’re much better prepared to move beyond a research environment into production.
The work of running AI models often falls on powerful servers that can handle the heavy compute workload. But having AI models that can run directly on devices, particularly mobile devices, can improve user experience, mitigate connectivity issues, and reinforce users’ privacy. It is an important step in accelerating AI’s use and adoption. Being able to run vision models on-device, for example, benefits augmented reality (AR) experiences like AI-powered shopping, which relies on having the lowest latency possible to provide the best experience.
PyTorch Mobile, which is specifically targeted at ML and mobile developers, has been vital to this effort. By reducing runtime binary sizes, delivering faster execution with accelerated hardware, and speeding up experimentation with new on-device tooling, PyTorch Mobile has provided continuous improvements to the developer experience to enable PyTorch models to run directly on-device.
PyTorch Mobile now runs on devices like the Oculus Quest and Portal, as well as on desktops, and on the Android and iOS mobile apps for Facebook, Instagram, and Messenger. On-device AI will also play an important role with new, emerging hardware technologies such as wearable AR, where it will offer benefits such as lower latency and power consumption and enhanced user privacy.
These aforementioned benefits, and others, are already being demonstrated across Facebook. PyTorch has enhanced features of our technologies and services in use cases ranging from personalization features in Instagram to emerging applications in AR and virtual reality (VR). At the same time, it has streamlined our engineers’ workflows and decreased the time it takes them to improve their systems.
“PyTorch has enabled Facebook researchers and engineers to easily convert a research idea to production, driving breakthrough impact. And because of the growing ecosystem of tools and community of contributors, we’re now able to benefit from essentially speaking the same AI language across the industry.”
“Our ability to quickly and accurately identify problematic content on our platforms helps keep our users safe. The PyTorch ecosystem makes our AI models more accurate and enables our team to ship state-of-the-art tech to production faster to identify harmful content.”
“The field of AI continues to change quickly, and where we build AI - the platforms and systems - is an increasingly critical component to enable cutting edge research explorations and bring it to production with a seamless developer experience. PyTorch gives us the flexibility and scalability we need to move fast and innovate at Facebook.”
“Thanks to PyTorch, automatic sharing of models allows us to train really large models on host machines. This would have been impossible without the work done on authoring and training infrastructure. And new paradigms of training like pipelining and hierarchical training allows our models to consume more data without any model quality loss.”
"Using PyTorch has allowed us to leverage engineering development velocity and increase machine learning model performance on a wide range of devices.”
"The ease of deployment provided by PyTorch allows the Text-to-Speech team to quickly add new voices, languages, accents, and dialects to the system. And thanks to PyTorch’s lightweight and performant mobile runtime, these models can also be easily deployed on smaller devices such as mobile phones.”
“After moving to PyTorch for our models, it greatly streamlined our experience. PyTorch's dynamic computational graph allows us to focus on model design and training because it makes model development and debugging easier and more efficient."
The Instagram personalization team is tasked with continually improving and refining the recommendation engines that power Instagram’s experiences. If you enjoy the content you see on Instagram through Feeds, Stories, or Reels, and find it relevant to you, then you have ML to thank. With PyTorch in place, the team can deliver even faster on making Instagram one of the best platforms for exploring your interests and sharing content with your family and friends.
Today, constantly improving recommendations for the sheer number of people who use Instagram means training models that can be as large as 10 terabytes. Before PyTorch, training and tuning one of these large-scale models could take months. With PyTorch, it now takes just weeks, or even days.
Implementing new training techniques in PyTorch allows hundreds of engineers across various teams to quickly adopt and experiment with them. It also streamlines the creation of cross-team standards that make it easier to deploy and improve these models over time.
The team has also made improvements to its authoring and training infrastructure that allow models to be automatically sharded (broken up into smaller chunks) so that larger models can be trained on host machines. They’ve also adopted training paradigms such as pipelining and hierarchical training (breaking down a learning task into a series of subproblems or tasks) that allow models to consume more data without losing quality.
AR and VR are becoming increasingly important parts of the Facebook experience. Imagine a content creator, or even someone just having fun with their friends, creating videos with stylized, computer-generated imagery such as graphics and backgrounds. Now, imagine them doing all of this on their mobile device, without having to use professional graphics software or filmmaking equipment. That’s the promise of AR that PyTorch is making possible by significantly speeding up the training process and shrinking the size of these models — even as they grow in complexity.
For example, researchers and engineers that work on AR experiences and partner teams create person segmentation models that follow people’s movements on video (including recognizing their hands and hair) using only their phone’s camera. When software understands where a person is within a physical space, it understands where to place AR graphics around them, and how those graphics should interact with them.
When these models were first developed, their size and complexity meant it could take up to three days to deploy a model for a particular effect, not including any time it would take to debug the model to fix any errors. Then there was the issue of functionality across devices. Sometimes models wouldn’t run as fast on certain devices or operating systems as others, making for an inconsistent user experience.
Now, those same models are developed using PyTorch and can be deployed within minutes, if not seconds, across multiple devices and operating systems. The models are trained using Detectron2Go (D2Go), a new, state-of-the-art PyTorch extension. D2Go is the first tool of its kind and allows developers to take their ML models from training all the way to deployment on mobile devices.
Since completing the migration of models to PyTorch in April of this year, inference time is 14 percent faster and model loading is 24 percent faster, which allows the team to deploy more complex and accurate models on mobile devices with the same latency.
A central focus of Facebook’s AI efforts is deploying cutting-edge machine learning technology to protect people from harmful content like hate speech and misinformation. Our goal is to identify these forms of policy-violating content quickly and accurately, for every form of content, and for every language and community around the world. It’s an incredibly difficult task done at a massive scale against adversaries who are constantly working to evade our systems.
These challenges are complex, nuanced, and rapidly evolving. We continue to explore how AI can become a more effective tool for detecting harmful content, and in order to do that, Facebook AI engineers are leveraging PyTorch to help them more quickly develop new, more powerful models and improve current ones.
Using PyTorch, Facebook’s engineers have developed Facebook AI Multimodal (FAIM), an internal library and SDK that allows developers to rapidly create, optimize, and deploy customized multimodal models tailored to specific harmful issues (e.g., misinformation and hate speech), meaning they can identify content across images, text, comments, and other elements holistically. Rather than relying on a fleet of different models, all focused on their own type of content or modality, FAIM models are able to analyze all types of content (images, videos, etc.).
One such model is Whole Post Integrity Embeddings (WPIE), a service that has been trained to recognize harmful content in various forms across modalities. As a result, WPIE has a much deeper understanding of content and is able to identify harmful content in a variety of contexts and improve quickly as new forms of harmful content emerge.
The benefit is a faster, more efficient, and more comprehensive way of analyzing content. For example, sentences or images that seem innocuous on their own can take on an entirely different context when combined.
Today, over 85 percent of our user-facing, multimodal models for integrity products are using PyTorch and FAIM. Models created with FAIM, like WPIE, can understand the deep interaction of visual and textual concepts, meaning they can detect harmful content more accurately and thoroughly. While AI tools like FAIM aren’t our only solution to tackle problematic content, they do help give us the adaptability we need to address these challenges on a large scale.
As voice assistants and similar technologies become more common — for both the ease of use and the accessibility they provide — our engineers are working to make voice interactions as natural as human conversation. The more these systems behave and sound like people, the more seamless our interactions with them will be.
Today, Facebook’s engineering teams are using PyTorch to create models that power a number of voice applications, ranging from Facebook’s “how to pronounce your name” feature, to voice interactions on the Portal, to text-to-speech (TTS) functions.
Facebook’s TTS team recently built and deployed a new TTS system with state-of-the-art audio quality, deployed on CPU servers without any specialized hardware. The new TTS system is highly flexible and will play a role in creating new voice applications across Facebook’s products that sound more realistic and natural, including voice functions for VR and reading assistance.
PyTorch has streamlined the entire development pipeline for the TTS team by making it easier to develop and experiment with new models as well as train them.
On the model training end, PyTorch features a convenient, flexible, and easy-to-use interface; pythonic coding; a comprehensive suite of highly optimized operator kernels; and efficient multi-GPU primitives, making models both easy to debug and fast to train on a large scale.
On the model inference and deployment side, PyTorch has a powerful, TorchScript-based model optimization pipeline that transforms the computation graph into the most efficient form on the deployment environment. PyTorch’s lightweight and performant mobile runtime offers benefits to the team by providing high-performance model inference services with a low compute and memory footprint.
In the future, thanks in part to PyTorch, not only will voice systems be able to understand more and more languages, but they also might be able to respond accordingly to context cues such as the tone or volume of someone’s voice, or even the level of noise in the background.
It’s no secret that the internet loves images. Having systems that can understand text that appears in images, whether it’s a personal or business photo, an informational image such as a map or menu, or even just a funny meme, is important for a slew of reasons. Photo search, screen readers for the visually impaired, and identifying and removing harmful content all rely on ML systems that can analyze text from images and videos.
One of these systems is the optical character recognition (OCR) system developed by Facebook AI. OCR can locate and extract text from images and videos in multiple languages for various use cases, ranging from integrity to search. By switching OCR’s framework over to PyTorch, the team has been able to make the system more robust and easier to deploy and debug.
OCR has two main models: one for text detection and another for text recognition. The text detection model is trained using Detectron 2, a PyTorch-based library of object detection models.
Given the quantity of data needed to train these models and the size of the models themselves, latency was often a concern for developers. But moving over to PyTorch has streamlined the experience, allowing them to quickly experiment and iterate on the models’ architectures and to debug and deploy models more efficiently.
The team is currently working to develop a new end-to-end model that can handle both text detection and text recognition in one unified design that will be based entirely in PyTorch, from training to deployment.
Not only is PyTorch now widely adopted at Facebook, but it also has become one of the default core development libraries used in the AI industry today. It has found a home in academia, with small companies and startups, and with major technology companies, including Nvidia, Microsoft, and many more.
Leaders across industries have already used PyTorch in myriad applications, ranging from health care to transportation to even agriculture. Global biopharmaceutical company AstraZeneca uses PyTorch with an aim to accelerate its drug discovery research. theator, a company applying computer vision and AI to analyze surgical procedure footage, uses PyTorch to scale up its Surgical Intelligence Platform, which assists surgeons.
Ridesharing companies like Lyft Level 5 and Uber ATG have used PyTorch in their autonomous vehicle development. And Blue River Technology, a subsidiary of John Deere, is taking a similar route, using PyTorch to create the AI behind its highly automated farming machines.
PyTorch is even having an impact in industries like mining, where Australian technology services company DiUS has partnered with Solve Geosolutions to develop Datarock, a system that uses ML and computer vision to analyze the geology of mining sites.
There are currently over 1,800 entities contributing to the PyTorch community — including institutions such as Caltech and companies pushing the boundaries of AI research, like OpenAI. According to Google Scholar, the original PyTorch paper — written by our researchers in collaboration with researchers at the University of Warsaw, Oxford, and OROBIX — has been cited over 4,400 times. PyTorch’s discussion forums boast over 40,000 active users, and there are over 70,000 downstream projects on GitHub using PyTorch. Academically, PyTorch saw a 127 percent year-over-year increase in citations on ArXiv from June 2019 to June 2020 alone.
Facebook itself has also continued to make its own contributions to the PyTorch community. For example, we’ve open-sourced projects like Captum, a library for model interpretability that helps researchers and developers better understand how AI models are making decisions. For developers building AI systems that work with encrypted data, the CrypTen library facilitates research into secure and privacy-preserving ML. And PyTorchVideo is a new open source library for understanding video that acts as a unified repository of reproducible and efficient video, AI models, and data sets in PyTorch.
Since its release nearly five years ago, PyTorch has become the common language in AI research both within and outside Facebook. PyTorch has made it easier than ever for Facebook not only to share its own discoveries, but also to learn from the open source community. Thanks to our PyTorch migration efforts, Facebook’s engineers can, in a sense, speak the same language as a vast community of engineers, researchers, and developers who are creating new, state-of-the-art AI technologies.