June 12, 2020
The most advanced framework for dense pose estimation for chimpanzees. It will help primatologists and other scientists study how chimps across Africa behave in the wild and in captive settings. The framework leverages a large-scale dataset of unlabeled videos in the wild, a pretrained dense pose estimator for humans, and dense self-training techniques. This is a joint project in collaboration with our partners the Max Planck Institute for Evolutionary Anthropology (MPI EVA) and the Pan African Programme: The Cultured Chimpanzee, and their network of collaborators.
We show that we can train a model to detect and recognize chimpanzees by transferring knowledge from existing detection, segmentation, and human dense pose labeling models. Notably, our method does not require a single manually annotated dataset for this target category, which could accelerate training for new species compared with previous methods.
To achieve this, we first trained a DensePose model for the new animal class using labeled data, then geometrically aligned it with our existing DensePose human model, as shown in the image below:
In order to augment and adapt the human DensePose dataset to new species, we used self-supervision and pseudo-labeling techniques.
We introduced a multihead autocalibrated R-CNN architecture that facilitates transfer of multiple recognition tasks between classes. We first used a model pretrained on a different class or set of classes to generate labels in the new domain, and then we retrained the model to fit those labels. This required ranking pixel-level pseudo-labels by reliability and selecting which ones to use for training a model for this class.
Ultimately, we successfully trained a network for dense pose estimation by carefully selecting which categories to use to pretrain the model, by using a class-agnostic architecture to integrate different sources of information and grading pseudo-label for self-training.
Modern computer vision techniques have made tremendous progress in being able to recognize people’s poses extremely accurately, given large-scale human pose datasets manually annotated in detail. It’s not practical to apply the same supervised techniques to the natural world, where there are as many as 6,500 species of mammals, 60,000 vertebrates, and 1.2 million invertebrates. Learning about even just one species would be time-intensive and laborious, let alone millions of them.
This project is just the first step in building systems that can unlock large-scale data collection and analysis in the wild. Researchers would then be able to automatically extract information to better understand population counts in specific regions, how they evolve, their behavior patterns, and how they interact with one another, as well as to monitor the health of a particular species and much more.
We focused on chimps first because of their evolutionary similarities to humans, and we believe this approach using in-the-wild data collection can generalize to other animal classes as well. We’re encouraged by these promising results and prospects of providing tools for automatic visual animal analysis at scale. In the future, our plan is to directly collaborate with ecologists, zoologists, and anthropologists, and continue improving our techniques while minimizing supervision as much as possible.
This work will be presented at CVPR 2020. Learn full details about Facebook AI at CVPR here.
Foundational models
Latest news
Foundational models