INTRODUCING DINOV3

Self-supervised learning for vision at unprecedented scale

DINOv3 scales self-supervised learning (SSL) for images to produce our strongest universal vision backbones, enabling breakthrough performance across diverse domains.

DINOV3 OVERVIEW

Cutting-edge image representations, trained without human supervision

We scaled unsupervised training to 7B-parameter models and 1.7B image datasets, using a fraction of compute compared to weakly-supervised methods. Despite keeping backbones frozen during evaluation, they achieve absolute state-of-the-art performance across diverse domains.

Exceptional performance across visual domains

SSL unlocks domains where annotations are scarce or costly. Backbones enable state-of-the-art results for tasks including object detection in web imagery, but also canopy height mapping in satellite and aerial imagery.

Versatile backbone with powerful dense image features

High-resolution dense features from a single DINOv3 backbone enable leading performance across vision tasks, including object detection, depth estimation, and segmentation, without any finetuning.

Efficient model sizes and architectures

We release a comprehensive model suite addressing a wide range of use cases, including broad coverage of ViT sizes and efficient ConvNeXt models for on-device deployment.

PERFORMANCE

Evaluating DINOv3's Performance

DINOv3 sets a new standard in vision foundation models. For the first time, a model trained with SSL outperforms weakly-supervised models on a broad range of probing tasks, from fine-grained image classification, to semantic segmentation, to object tracking in video.

Chart of DINOv3 performance stats

APPROACH

Self-supervised pre-training unlocks simple task adaptation

Pre-training data is curated from a large unlabeled dataset. During pre-training, the model learns general-purpose visual representations, matching features between different augmented views of the same image. In post-training, the model is distilled into more efficient models.

A pre-trained DINOv3 model can be easily tailored by training a lightweight adapter on a small amount of annotated data.

DINO Evolution

DINOv3 marks a new milestone in self-supervised training at scale. It builds upon the scaling progress of DINOv2, further increasing the model size x6, and training data x12.

DINO

Initial research proof-of-concept, with 80M-parameter models trained on 1M images.

Read the research paperDownload the model

DINOv2

First successful scaling of a SSL algorithm. 1B-parameter models trained on 142M images.

Read the research paperDownload the model

DINOv3

An order of magnitude larger training compared to v2, with particular focus on dense features.

Read the research paperDownload the model

Explore additional resources

Read the AI at Meta blog
Read the research paper
Download DINOv3
DINOv3 on Hugging Face