
"The universe is a pretty big place. If it's just us, it seems like an awful waste of space," Carl Sagan wrote in the science fiction novel "Contact." This quote captures what people have been asking for a millennia as they gazed skyward, pondering their place in the vast cosmos wondering, “Are we alone?’ Innate curiosity, coupled by advancing technology, has led us into the era of space exploration. However, venturing into the unknown presents formidable challenges, with planetary landings being particularly difficult. To mitigate the risk to human life, robotic explorers have been leading the vanguard for planetary surface exploration. NASA’s Jet Propulsion Laboratory in Southern California has sent rovers to Mars on many notable missions, including Sojourner (1997), Spirit and Opportunity (2004), Curiosity (2012), and most recently, Perseverance and Ingenuity (2021).
As NASA plans for future deep space missions, rapid, multi-planetary exploration will require the simultaneous deployment of multiple robotic explorers. To send them further but to also maintain cost-effectiveness, these next-generation robots must be smaller in mass yet equally capable, designed to journey farther and operate autonomously under uncertain conditions with significant communication delays. Their reduced size, however, introduces a new challenge: lower onboard power capacity, restricting the amount of onboard compute available and the sensor hardware that can be carried, while still requiring them to perform the same complex tasks efficiently as the larger robotic explorers.
Given that communication latencies between Earth and spacecraft millions of miles away can span minutes to hours, these systems need to operate robustly, efficiently, and autonomously—with minimal computational resources carried onboard. To achieve this, these robots must have advanced computer vision that will empower them to tackle diverse challenges, such as generating depth maps for terrain assessment and hazard avoidance and the ability to autonomously detect potential biosignatures upon landing in unexplored environments.
One of the ways the JPL team is tackling this unique robotics challenge is with DINOv2, a set of open source state-of-the-art vision foundation models, Meta Fundamental AI Research (FAIR) released in April 2023. Building on top of DINOv2, the team created a Visual Perception Engine that enables efficient reuse of vision foundation model features across multiple tasks as a common backbone while minimizing feature copies, reducing both the GPU compute and memory requirements. The team’s framework is available to the open source community on GitHub and provides a convenient robot operating system interface for robotic tasks.

As the team set out to build a robot stack that was just as capable as larger NASA rovers but on a reduced resource platform, they used their Nebula Spot Robot, which is equipped with multiple sensors, as a testbed. For this project, they restricted the onboard sensor suite to an Inertial Measurement Unit (IMU) and RGB camera, which sends data for processing to an Nvidia Orin AGX, a small onboard computer. For vision tasks, the team faced a challenge as different machine learning models were needed for different tasks and concurrently executing them on limited compute hardware did not meet the team's real-time operation requirements.

Robots need depth measurements to build maps, they need object detection to identify their science objectives, and they need the ability to segment the identified objects so they can interact with them, the team points out. Traditionally a different model would be used for each task, but with the limitations of smaller computers it’s very difficult to do. With DINOv2 Vision Foundation Model features, multiple smaller task heads can now share a single feature extraction backbone.
Unlike task specific models, which require vision feature extraction individually per task, such as depth estimation or terrain traversability assessment, the Visual Perception Engine streamlines this process. It retains the DINOv2 extracted features in its GPU memory and shares them across multiple tasks, thus enabling multiple smaller model heads to be deployed in-parallel to enhance the robot's capabilities, while decreasing total model parameter count and increasing the overall image throughput.
Furthermore, the JPL team has been putting the saved compute and memory resources to good use by introducing new learning capabilities so robots can adapt to unknown terrain through vision and terrain interaction online. Using DINOv2 vision features along with robot power usage, the robot can learn to identify the cost of terrain traversal in real time, allowing it to avoid treacherous terrain that may look benign in depth measurements, such as smooth, soft sand, which could cause a robot to become stuck. Such onboard online learning capabilities are very useful for planetary rovers as unavailability of previous data, combined with relying on large communication delays with Earth, can get the robots into tricky situations. For example, in 2005, the Mars Opportunity Rover got stuck in a sand trap. The JPL Team—108 million miles away on Earth—spent nearly five weeks sending commands to the rover and waiting for feedback signals, as the team worked to maneuver the rover out of the sand trap.

The results JPL has achieved with DINOv2 are impressive. By harnessing the power of a single model to execute a suite of critical computer vision tasks, they've slashed the number of parameters by 67%—a massive step forward toward their vision of building an equally capable robot autonomy stack on a reduced resource platform. As the Meta FAIR team continues to iterate on DINO, the engineers at JPL are focused on building on top of it and continuous experimentation to better understand how this technology could operate in the vast, uncharted realms of space.
More testing and validation will be needed before DINO flies as part of a potential future space expedition, but the team also envisions important uses for this technology on Earth, such as navigating difficult terrain to assist in humanitarian rescue efforts like the Tham Luang cave rescue in 2018. Caves on Earth have similar challenges to what a rover could encounter in space, which means this technology can serve dual purposes. In addition to helping out at home, these small but mighty explorers could be crucial in the search for life on other planets. Caves have natural radiation shielding, which increases the chances for potential extraterrestrial lifeforms to exist. Whether it's exploring the unknown reaches of space or navigating treacherous terrain on Earth, DINO is poised to enable advancements in robot learning and adaptation—and the future can't wait.
Our latest updates delivered to your inbox
Subscribe to our newsletter to keep up with Meta AI news, events, research breakthroughs, and more.
Our approach
Latest news
Foundational models