May 18, 2023 · 5 min read
Meta’s AI compute needs will grow dramatically over the next decade as we break new ground in AI research, ship more cutting-edge AI applications and experiences for our family of apps, and build our long-term vision of the metaverse.
We are now executing on an ambitious plan to build the next generation of Meta’s infrastructure backbone – specifically built for AI – and in this blog post we’re sharing some details on our recent progress. The projects we’re announcing here touch many of the layers of our hardware and software stack as well as the customized network that connects these technologies from top to bottom. They include our first custom chip for running AI models, a new AI-optimized data center design, and phase 2 of our 16,000 GPU supercomputer for AI research.
These transformational efforts — and additional projects still underway — will enable us to develop much larger, more sophisticated AI models and then deploy them efficiently at scale. AI is already at the core of Meta’s products, enabling better personalization; safer, fairer products; and richer experiences, while also helping businesses reach the audiences they care about most. We are even reimagining how we code — deploying Code Compose, a generative AI–based coding assistant developed in-house at Meta as a key tool to make our developers more productive throughout the software development life cycle. By rethinking how we innovate across our infrastructure, we’re creating a scalable foundation to power emerging opportunities in the near term in areas like generative AI, and in the longer term as we bring new AI-powered experiences to the metaverse.
For more on the AI investments shared in this post, check out the Meta AI Infra @Scale page.
Ever since we broke ground on our first data center back in 2010, Meta has built a global infrastructure that today serves as the engine for the more than 3 billion people who use Meta’s family of apps each day. AI has been an important part of these systems for many years — from our Big Sur hardware in 2015 to our development of PyTorch to our initial deployment last year of Meta’s supercomputer for AI research. We’ve now advanced our infrastructure in exciting new ways:
These AI-focused efforts enable us to take advantage of exciting new software advances like PyTorch 2.0. The latest version of this open source AI framework, which was created by Meta in 2016 in partnership with the AI community, offers the same powerful, flexible, easy-to-use workflow. But it fundamentally changes and accelerates how the framework operates at the compiler level under the hood. With 2.0, PyTorch now provides faster performance and support for new features, like accelerated transformers and dynamic shapes.
Custom-designing much of our infrastructure enables us to optimize an end-to-end experience, from the physical layer to the software layer to the actual user experience. We design, build, and operate everything from the data centers to the server hardware to the mechanical systems that keep everything running. Because we control the stack from top to bottom, we’re able to customize it for our specific needs. For example, we can easily colocate GPUs, CPUs, network, and storage if it will better support our workloads. If that in turn means we need different power or cooling solutions, we can rethink those designs, as well, as part of one cohesive system.
This will only be more important in the years ahead. Over the next decade, we’ll see increased specialization and customization in chip design, purpose-built and workload-specific AI infrastructure, new systems and tooling for deployment at scale, and improved efficiency in product and design support. All of this will deliver increasingly sophisticated models built on the latest research — and products that give people around the world access to this emergent technology.
Meta has always focused on delivering long-term value and impact to guide our infrastructure vision. We believe our track record of building world-class infrastructure positions Meta to continue to lead in AI over the next decade and beyond, and the work we’ve discussed here will have a significant impact on our Family of Apps today and metaverse initiatives tomorrow.
We look forward to sharing more updates on our work to harness AI’s immense potential in new ways to benefit billions of people. For more on the AI investments shared in this post, check out the Meta AI Infra @Scale page.