
As AI moves from research into production, developers face a growing gap between cutting-edge techniques and the tools needed to build and scale them. At PyTorch Conference 2025, we introduced a new wave of PyTorch-native projects that close this gap: ExecuTorch 1.0, Torchforge, Monarch, TorchComms, Helion, and OpenEnv. Together, these projects support the entire lifecycle of agentic AI — from deploying post-trained LLMs and agents on phones and wearables to running reinforcement learning at scale, simplifying distributed execution and enabling fault-tolerant communication, and accelerating custom kernel development. We’re excited to release what we believe are the building blocks of agentic AI.
Our stack is guided by three core principles:
From the lowest level (kernels) to the highest (agents), we’re unveiling the core components of the next-generation PyTorch stack:

Helion (Kernel Authoring)
Helion introduces a Python-embedded, domain-specific language for authoring machine learning kernels, compiling directly to Triton. By automating and abstracting performance tuning for GPUs and other accelerators, Helion dramatically reduces the amount of code developers need to write to build advanced kernels. Authoring roofline performant kernels requires 4x fewer lines of code with Helion compared to Triton, making it more accessible and significantly improving developer efficiency. This approach makes sophisticated machine learning (ML) engineering accessible to a much wider range of developers.
As the push toward superintelligence accelerates, it’s important for thousands of developers to be able to tune and extend AI systems at the hardware level. Helion democratizes ML kernel development, enabling faster progress and broader participation across the entire AI technology stack.
“Making Python-based kernel authoring simpler and more accessible is important to NVIDIA. Supporting Meta’s work on Helion will help developers unlock new levels of performance on NVIDIA systems.”
— Luis Ceze, Vice President of AI Systems Software, NVIDIA
“Torchcomms exemplifies the kind of problem-solving innovation our industry needs — delivering robust, fault-tolerant distributed communication that scales seamlessly across diverse hardware ecosystems while eliminating the barriers that traditionally force developers to choose between performance and hardware flexibility.”
— Anush Elangovan, Vice President AI Software, AMD
Monarch is a distributed execution engine for PyTorch that reimagines cluster-scale execution and orchestration with a focus on developer simplicity. By leveraging a single, centralized controller, Monarch abstracts away the complexities of multi-node environments, allowing you to write scalable code that feels almost like a local, single-GPU workflow. This architecture enables seamless expansion as models grow larger or experiments become more demanding.
While scaling agentic AI often requires massive compute resources, Monarch ensures that this power doesn’t come at the cost of developer experience. Its streamlined approach democratizes access to cluster-scale capabilities, making advanced AI experimentation as intuitive and frictionless as working in a prototype notebook.
“With Monarch, we see a glimpse of the future of training. We’re stoked that the team picked Lightning as the ideal platform for the launch. We both care about making AI builders work at scale without friction.”
— Luca Antiga, CTO, Lightning.AI
Torchforge is a PyTorch-native library purpose-built for scalable reinforcement learning (RL) post-training and agentic development. Torchforge delineates infra concerns from model concerns, making RL experimentation easier. Torchforge delivers this by providing clear RL abstractions and one scalable implementation of these abstractions. When you need fine-grained control over placement, fault handling/redirecting training loads during a run, or communication patterns, the primitives are there. When you don’t, you can focus purely on your RL algorithm.
"Our work on Weaver is about pushing the boundaries of reward modeling and automated verification. Integrating it with torchforge allows us to quickly experiment with new ideas while relying on a robust, scalable RL backbone. This partnership accelerates both our research and the broader ecosystem."
— Azalia Mirhoseini, Founder, Scaling Intelligence Lab, Stanford University
We’re releasing ExecuTorch 1.0, Meta’s end-to-end solution for on-device AI, which helps enable innovative experiences across Facebook, Instagram, Meta Quest, Ray-Ban Meta glasses, WhatsApp, and more. ExecuTorch enables advanced AI capabilities directly on mobile, desktop, and edge devices and supports a broad range of models, including large language models, computer vision, automatic speech recognition, and text-to-speech.
Backed and used by industry leaders like Qualcomm, Apple, and Arm, ExecuTorch’s on-device AI framework and runtime are designed for seamless integration across diverse hardware platforms. The 1.0 general availability (GA) release will deliver enhanced performance, stability, and integration with key platforms, showcasing ExecuTorch’s versatility in supporting next-generation generative AI applications on mobile and desktop.
“Qualcomm Technologies is proud to have partnered with Meta to bring ExecuTorch to edge-AI devices powered by our platforms, enabling a PyTorch-native, end-to-end on-device AI experience. ExecuTorch GA offers portability, stability, and performance, empowering developers to efficiently deliver innovative AI features. ExecuTorch and PyTorch models — including text, vision, speech, and LLMs — can be seamlessly deployed across mobile, desktop, and IoT devices, leveraging NPU and GPU acceleration on Qualcomm Technologies’ platforms.”
— Jeff Gehlhaar, Senior Vice President, Engineering, Qualcomm Technologies, Inc.
To supercharge this next wave of agentic development, Meta and Hugging Face are partnering to launch an open Hub for Environments — a shared space where developers can build, share, and explore OpenEnv-compatible environments for both training and deployment.
Starting this week, developers can:
Every environment uploaded to the Hub that conforms to the OpenEnv specification automatically gains this functionality — making it fast and easy to validate and iterate before running full RL training.
In addition, we’re releasing the OpenEnv 0.1 Spec (RFC) to gather community feedback and help shape the standard.
“The next wave of AI will be defined not just by open models, but by open environments. Partnering with Meta to launch the OpenEnv hub brings the same collaborative energy that made model sharing possible to the world of agentic systems, giving developers everywhere a common foundation to build, test, and deploy the next generation of AI agents.”
— Clem Delangue 🤗, Co-Founder & CEO, Hugging Face
We’re excited by the rapid adoption and results our early partners are seeing with these new tools. With the addition of this new stack, the PyTorch ecosystem has never been more vibrant. These projects are available now and open to contributions and collaborations. To learn more, go to meta-pytorch.org to visit the respective project repos. We can’t wait to see what the community starts building.
We’d like to acknowledge AMD, AppleCoreML, Arm, CoreWeave, Google, Hugging Face, IBM, Intel, Lightning.AI, Mediatek, Mithril, NVIDIA, NXP, Prime Intellect, Qualcomm, Runhouse, Samsung, Stanford University, and Surge AI for collaborating with us on this work.
Our approach
Latest news
Foundational models