Inspired by FAIR

How Common Sense Machines uses Meta Segment Anything Model and AI to generate production-ready 3D assets

May 1, 2025
4 minute read

Generative AI continues to make strides in 2D image and video creation—but when it comes to creating 3D assets, new challenges must be overcome. Existing models often have a lack of training data or are handling data that is difficult to annotate. And in most cases, they have to be able to handle plausible views of the generated object from every angle. Common Sense Machines (CSM), a company based in Cambridge, Massachusetts, is working to make generating 3D assets easier with AI-powered software that quickly generates production-ready 3D assets. CSM uses Meta Segment Anything Model 2 (SAM 2), an open source model we released last year, to analyze 2D images and video and translate their components to 3D.

Tejas Kulkarni, CEO of CSM, says that for a small startup like his, Meta’s open source approach has been essential. Access to open source software such as SAM 2 allows CSM to leverage models and elements its team isn’t able to build in-house.

“It’s almost like an extended massive research team that we can’t afford,” Kulkarni says. “I think it’s critical for companies like ours, where we started from nothing. To come to this point, from nothing to here, I think it would have been impossible without open source.”

Creating 3D content is a time-consuming process that can take hours or days to make a single asset. CSM’s goal is to accelerate and democratize 3D art and content production, using its AI-driven software to greatly streamline workflows for artists and do a lot of the work that usually requires software like Blender, TopoGun, Maya, ZBrush, and Adobe Suite. With CSM’s generative AI software, it’s possible to quickly convert a 2D image into a high-quality 3D asset that can actually be used in production contexts. The company also aims to make it possible for users to create 3D assets for their projects—even if they don’t have expertise in 3D modeling processes and software.

Meta’s open source SAM model, released in April 2023, enabled both interactive and automatic image segmentation—identifying which pixels correspond to specific objects within an image—with unparalleled flexibility. Meta continued to expand SAM’s capabilities and, in July 2024, released SAM 2, which allows for real-time, promptable object segmentation in images as well as videos. CSM uses SAM 2 to identify individual elements of an image to create accurate and useful 3D models of each component.

CSM’s software translates text prompts and 2D images into 3D assets, but while a solid 3D asset is a start, there’s still a lot to be done to make it ready for production. SAM 2’s segmentation helps CSM create 3D assets with all their component parts, so they’re ready for processes like rigging and animation. That allows CSM’s generated assets to be used in applications like game engines, virtual reality experiences, and visual effects.

Kulkarni says CSM’s technology is already helping to speed up workflows for companies including game developers, and CSM is working with different companies to help them improve their processes. The company’s next goal is to release generative AI software that can use 2D images to generate full 3D worlds, and Segment Anything will continue to be a big part of that.

“I really applaud Meta for actually taking that stance [with open science],” Kulkarni says. “Without that, it’s very hard to be competitive, and it’s hard to have the resources to focus on the peripheral things that we don’t have expertise on, like those image segmentation models and video segmentation models.”


Share:

Our latest updates delivered to your inbox

Subscribe to our newsletter to keep up with Meta AI news, events, research breakthroughs, and more.

Join us in the pursuit of what’s possible with AI.

Related Posts
Computer Vision
Introducing Segment Anything: Working toward the first foundation model for image segmentation
April 5, 2023
FEATURED
Research
MultiRay: Optimizing efficiency for large-scale AI models
November 18, 2022
FEATURED
ML Applications
MuAViC: The first audio-video speech translation benchmark
March 8, 2023