August 11, 2022
Rapid advances in neural implicit representation are opening up exciting new possibilities for augmented reality experiences. This computer vision technique can seamlessly combine real and virtual objects in augmented reality — without requiring large amounts of data to learn from and without being limited to just a few points of view. It does this by learning a representation of a 3D object or scene using a sparse set of combined images of that object or scene from arbitrary viewpoints. Unlike traditional 3D representations such as meshes or point clouds, this newer approach represents objects as a continuous function, which allows for more accurate reconstruction of shapes with complex geometries as well as higher color reconstruction accuracy.
Meta AI is now releasing Implicitron, a modular framework within our popular open source PyTorch3D library, created and released to advance research on implicit neural representation. Implicitron provides abstractions and implementations of popular implicit representations and rendering components to allow for easy experimentation.
This research area is still in its nascent phase, with new variants regularly emerging and no clear method of choice. After the introduction of NeRF, more than 50 variants of this method for synthesizing novel views of complex scenes have been published in the past year alone. Implicitron now makes it easy to evaluate variations, combinations, and modifications of these methods with a common codebase that doesn’t require expertise in 3D or graphics.
Most current neural implicit reconstruction methods create real-time photorealistic renderings via ray marching. In ray marching, rays are emitted from the rendering camera and 3D points are sampled along these rays. An implicit shape function (which represents the shape and appearance of the scene) then evaluates density or distance to the surface at the sampled ray points. A renderer then marches along the ray points to find the first intersection between the scene’s surface and the ray in order to render image pixels. Lastly, the loss functions or discrepancy between generated and ground-truth images are computed, along with other metrics.
With this generic structure in mind, Meta has created modular and composable implementations of each component. This includes the RaySampler and PointSampler classes responsible for sampling rays and ray points. The ray points can be encoded with a HarmonicEmbedding class (implementing the NeRF’s Positional Embedding) or with a ViewSampler, which samples image features at the 2D locations of 3D point projections (PixelNeRF, NeRFormer). Given per-point feature encodings, Implicitron can leverage one of several implicit shape architectures (NeRF’s MLP, IDR’s FeatureField, SRN's implicit raymarcher) that generate the implicit shape. A renderer (MultiplassEmissionAbsorptionRenderer, LSTMRenderer, RayTracing) then converts the latter to an image. The training process is then supervised with several losses, including MSE, PSNR, and Huber losses between optionally masked images, segmentation masks, depth maps, and method-specific regularizers like Eikonal loss, Beta prior on the predicted mask, and TV regularizer for voxel grids.
This modular architecture allows people using the framework to easily combine the contributions of different papers and replace specific components to test new ideas. As the flagship end-to-end example, the Implicitron framework implements a state-of-the-art method for generalizable category-based new view synthesis, as proposed in our recent Common Objects in 3D work. This extends NeRF with a trainable view-pooling layer based on Transformer architecture.
Meta has also developed additional components to help make experimentation and extensibility easier. This includes a plug-in and configuration system that enables user-defined implementations of the components and flexible configurations that enable switching between implementations. It also includes a trainer class that uses PyTorch Lightning for the launching of new experiments.
Just as Detectron2 has become the go-to framework for implementing and benchmarking object detection methods on a variety of data sets, Implicitron aims to serve as a cornerstone for conducting research in the field of neural implicit representation and rendering. This lowers the barrier to entry into this field and enables vast new opportunities for exploration.
It is crucial to have better tools that can take image data and create accurate 3D reconstructions in order to accelerate research in AR/VR. This allows for useful real-world applications, like enabling people to try clothing on virtually when shopping in AR and VR or to relive memorable moments from different perspectives. This work complements Meta’s advances in Detectron2, another Meta AI open source platform that enables object detection, segmentation, and other visual recognition tasks; Common Objects in 3D; state-of-the-art 3D content understanding; self-supervised learning and Transformers; and convolutional neural nets.
By integrating this framework within the popular PyTorch3D library for 3D deep learning, already widely used by researchers in the field, Meta aims to give people using the framework a way to easily install and import components from Implicitron into their projects without needing to reimplement or copy the code.
We'd like to acknowledge the contributions to Implicitron from the larger group of researchers who work on PyTorch3D here at Meta AI.
Computer Vision Engineer
Research Engineering Manager