3D understanding plays an important role in advancing the ability of AI systems to better understand and operate in the real world — including navigating physical space in robotics, improving virtual reality experiences, and even recognizing occluded objects in 2D content. But research in 3D deep learning has been limited because of the lack of sufficient tools and resources to support the complexities of using neural networks with 3D data and the fact that many traditional graphic operators are not differentiable.
Facebook AI has built and is now releasing PyTorch3D, a highly modular and optimized library with unique capabilities designed to make 3D deep learning easier with PyTorch. PyTorch3D provides a set of frequently used 3D operators and loss functions for 3D data that are fast and differentiable, as well as a modular differentiable rendering API — enabling researchers to import these functions into current state-of-the-art deep learning systems right away.
PyTorch3D was recently a catalyst in Facebook AI’s work to build Mesh R-CNN, which achieved full 3D object reconstruction from images of complex interior spaces. We fused PyTorch3D with our highly optimized 2D recognition library, Detectron2, to successfully push object understanding to the third dimension. PyTorch3D functions for handling rotations and 3D transformations were also central in creating C3DPO, a novel method for learning associations between images and 3D shapes using less annotated training data.
Researchers and engineers can similarly leverage PyTorch3D for a wide variety of 3D deep learning research — whether 3D reconstruction, bundle adjustment, or even 3D reasoning — to improve 2D recognition tasks. Today, we are sharing our PyTorch3D library here and open-sourcing our Mesh R-CNN codebase here.
One of the reasons 3D understanding with deep learning is relatively underexplored compared with 2D understanding is because 3D data inputs are more complex with more memory and computation requirements, whereas 2D images can be represented by simple tensors. 3D operations must also be differentiable so gradients can propagate backward through the system from model output back to the input data. It is especially challenging given that many traditional operators in the computer graphics field, such as rendering, involve steps that block gradients.
In the same way that PyTorch offers highly optimized libraries for 2D recognition tasks, PyTorch3D optimizes training and inference by providing batching capabilities and support for 3D operators and loss functions.
Because 3D meshes comprise a collection of vertex coordinates and face indices, they pose several challenges when batching 3D meshes of different sizes. To address this challenge, we created Meshes, a data structure for batching heterogeneous meshes in deep learning applications. This data structure makes it easy for researchers to quickly transform the underlying mesh data into different views to match operators with the most efficient representation of the data. PyTorch3D gives researchers and engineers the flexibility to efficiently switch between different representation views and access different properties of meshes.
We’ve done the legwork of optimizing the implementations of several common operators and loss functions for 3D data, supporting heterogeneous batches of inputs. This means that researchers and engineers can import the operators in PyTorch3D for faster experimentation without having to re-create or reimplement the operators from scratch at the start of each new project. These operators include chamfer loss, a method of comparing two sets of point clouds and used as a loss function for 3D meshes. We created an optimized way of computing highly resource-intensive nearest neighbor calculations for this loss function using CUDA kernels. We’ll continue to add to the set of common operators over time.
Rendering is a core part of computer graphics that converts 3D models into 2D images. It’s a natural way to bridge the gap between 3D scene properties and the pixels of a 2D image. Traditional rendering engines are not differentiable, however, so they can’t be incorporated into deep learning pipelines. Recently, several academic research projects (such as OpenDR, Neural Mesh Renderer, Soft Rasterizer, and redner) have shown how to build differentiable renderers that cleanly integrate with deep learning.
Differentiable rendering is a new area, and we wanted to tweak the core algorithm to focus on flexibility. We needed a rendering engine that makes it possible to access the wide variety of intermediate values that different downstream applications consume.
In PyTorch3D, we wrote an efficient, modular differentiable renderer. Our implementation consists of composable units, allowing users to easily extend the renderer to support custom lighting or shading effects. The computationally heavy rasterization step has parallel implementations in PyTorch, C++, and CUDA, as well as comprehensive tests to verify correctness. Like all other operators in PyTorch3D, our renderer supports heterogenous batches of data by relying on our Mesh data structure. You can find a deep dive into the implementation of the renderer and the different modules available in the PyTorch3D documentation.
Our goal with PyTorch3D is to drive progress at the intersection of deep learning and 3D. We’ve designed efficient and optimized operators, heterogeneous batching capabilities, and a modular differentiable rendering API to equip researchers and engineers with a toolkit to implement cutting-edge research with complex 3D inputs. With the unique differentiable rendering capabilities, we’re excited about the potential for building systems that make high-quality 3D predictions without relying on time-intensive, manual 3D annotations — and unlocking new directions of 3D research.
At Facebook AI, we’ll continuously improve and expand on the operators we provide in PyTorch3D, and we welcome contributions from the community as well to build this resource.