INTRODUCING V-JEPA 2

A self-supervised foundation world model

Video Joint Embedding Predictive Architecture 2 (V-JEPA 2) is the first world model trained on video that achieves state-of-the-art visual understanding and prediction, enabling zero-shot robot control in new environments.

CAPABILITIES

Understand, predict, plan

V-JEPA 2 is the next step towards our vision for AI that leverages a world model to understand physical reality, anticipate outcomes, and plan efficient strategies—all with minimal supervision.

Read the research paper

Unlock world understanding

V-JEPA 2 delivers exceptional motion understanding as well as leading visual reasoning capabilities when combined with language modeling.

Anticipate what’s next

V-JEPA 2 can make predictions about how the world will evolve, setting a new state-of-the-art in anticipating actions from contextual cues.

Planning for robotic control

Building on the ability to understand and predict, V-JEPA 2 can be used for zero-shot robot planning to interact with unfamiliar objects in new environments.

We train V-JEPA 2 on 62 hours of robot data from the Droid dataset, then deploy it on a robot arm in new environments. By specifying tasks as goal images, the model accomplishes tasks like reaching, grasping, and pick-and-place. Being task-agnostic, it can be trained without extensive robot data or task-specific demonstrations.

Evaluating V-JEPA 2’s Performance

Chart of V-JEPA 2 performance stats

MODEL ARCHITECTURE

Self-supervised world model

V-JEPA 2 employs a two-phase training approach.

The encoder and predictor are pre-trained through self-supervised learning from visual data, leveraging abundant natural videos to bootstrap physical world understanding and prediction.

Fine-tuning on a small amount of robot data enables efficient planning without requiring extensive expert robot demonstrations, which are much harder to collect at scale.

WORLD MODELS

Our vision for world models

What if AI could reason and plan as effortlessly as we do? This is one of the grand scientific challenges we’re tackling at Meta.

APPLICATIONS

Potential model applications

We’re releasing V-JEPA 2 for the community to build upon this work. We expect world models to power novel experiences and groundbreaking applications across diverse domains.

Download the model
Robotic assistant

Robotic assistants

We expect world models to unlock a new era of robotics, powering AI agents that navigate physical environments to tackle household chores and complex tasks.

Wearable assistant

Wearable assistants

World models can enable assistive technology that helps individuals navigate busy environments, providing real-time alerts about approaching obstacles and hazards.


RESOURCES

Explore V-JEPA

Read the AI at Meta blog
Read the research paper
V-JEPA 2 on Hugging Face
Download V-JEPA 2
Read more about V-JEPA 1
Download V-JEPA 1