April 26, 2021
Training AI systems with curated and labeled data sets has produced specialized AI models that excel at tasks like object recognition. But relying solely on this approach also has real limitations, including one we consider particularly important to address: Such systems can struggle to recognize objects that are common in daily life for billions of people but are underrepresented in the data often used to train the AI systems.
In particular, the choices made about which images to train on and how to label them can inadvertently introduce biases. An object-recognition system trained mostly on household images from the United States and Europe might struggle to perform equally well when asked to recognize objects in a home in Nepal, for example.
This is one reason we’re excited about SEER, a new high-performance computer vision system we’ve developed. By leveraging self-supervised learning, SEER can learn from any collection of digital images without requiring researchers to curate the collection and label each object.
Preliminary evaluations show that SEER can outperform conventional computer vision systems in recognizing objects that, while representative of life for billions of people, are less represented in conventional image data sets used to train AI systems.
We hope our work with SEER will help make AI work better for everyone, not just those who have typically benefitted the most.
We tested SEER on images from the Dollar Street data set that we used in our 2019 study on biases in computer vision systems. The SEER results show exciting signs of how self-supervised learning could make AI work better for people across the world.
SEER correctly identified the object in this image from a home in Nepal, for example, while a conventional system did not. Click the slider on the photo to compare their predictions (listed in order from highest to lowest probability). Photo: Luc Forsyth for Dollar Street 2015 (Free to use under CC BY 4.0)
In this photograph from a home in China, SEER correctly identified a stove, while the conventionally trained system didn’t. Click the slider on the photo to compare their predictions (also listed in order from highest to lowest probability). Photo: Jianxing Cheng for Dollar Street 2016 (Free to use under CC BY 4.0)
This photo shows a small street in India. Click the slider on the photo to compare the predictions made by SEER and by the conventional object-recognition system (also listed in order from highest to lowest probability). Photo: Zoriah Miller for Dollar Street 2015 (Free to use under CC BY 4.0)
Self-supervised learning has already shown tremendous promise in improving performance with languages and dialects that don’t have extensive collections of digitized texts to use as labeled training data. SEER’s ability to better perform object recognition in examples above is another exciting result, as the model is trained on random internet images without any data curation.
This suggests that the self-supervised approach used in training SEER could have a huge impact on efforts to build AI systems that effectively serve the entire world, not just the wealthy. These efforts are just the beginning, but it’s clear that we’re on an extremely exciting path of progress.
Technical Lead
Foundational models
Latest news
Foundational models