INTRODUCING DINOV3
DINOv3 scales self-supervised learning (SSL) for images to produce our strongest universal vision backbones, enabling breakthrough performance across diverse domains.
DINOV3 OVERVIEW
We scaled unsupervised training to 7B-parameter models and 1.7B image datasets, using a fraction of compute compared to weakly-supervised methods. Despite keeping backbones frozen during evaluation, they achieve absolute state-of-the-art performance across diverse domains.
SSL unlocks domains where annotations are scarce or costly. Backbones enable state-of-the-art results for tasks including object detection in web imagery, but also canopy height mapping in satellite and aerial imagery.
High-resolution dense features from a single DINOv3 backbone enable leading performance across vision tasks, including object detection, depth estimation, and segmentation, without any finetuning.
We release a comprehensive model suite addressing a wide range of use cases, including broad coverage of ViT sizes and efficient ConvNeXt models for on-device deployment.
PERFORMANCE
DINOv3 sets a new standard in vision foundation models. For the first time, a model trained with SSL outperforms weakly-supervised models on a broad range of probing tasks, from fine-grained image classification, to semantic segmentation, to object tracking in video.

APPROACH
Pre-training data is curated from a large unlabeled dataset. During pre-training, the model learns general-purpose visual representations, matching features between different augmented views of the same image. In post-training, the model is distilled into more efficient models.
A pre-trained DINOv3 model can be easily tailored by training a lightweight adapter on a small amount of annotated data.
DINO Evolution
DINOv3 marks a new milestone in self-supervised training at scale. It builds upon the scaling progress of DINOv2, further increasing the model size x6, and training data x12.
DINO
Initial research proof-of-concept, with 80M-parameter models trained on 1M images.
Read the research paperDownload the modelDINOv2
First successful scaling of a SSL algorithm. 1B-parameter models trained on 142M images.
Read the research paperDownload the modelDINOv3
An order of magnitude larger training compared to v2, with particular focus on dense features.
Read the research paperDownload the modelOur approach
Latest news
Foundational models