Computer Vision

Research

Alex Berg receives IEEE Helmholtz prize at ICCV

October 29, 2019

Facebook AI Research Scientist Alex Berg is one of the recipients of the 2019 Helmholtz Prize for fundamental contributions in computer vision. The award is given every other year at the International Conference on Computer Vision (ICCV), which is being held this week in Seoul, South Korea. It recognizes ICCV papers from 10 years ago that have made a significant impact on computer vision research. Berg and coauthors Neeraj Kumar, Peter N. Belhumeur, and Shree K. Nayar received the award for their paper, “Attribute and Simile Classifiers for Face Verification,” which was presented at ICCV in 2009.

Berg is presenting two papers as well and co-organizing several workshops, including one on extreme vision modeling, at ICCV this year. (More details on Facebook researchers’ work at ICCV are available here.)

Berg took a moment to share thoughts about his past, current, and future research, as well as the possible applications of his research and how the field of computer vision has advanced in recent years.

Tell us about the work that was recognized by the Helmholtz prize awards committee.

This award is for the paper “Attribute and Simile Classifiers for Face Verification,” which was presented at ICCV 2009 while I was a postdoc at Columbia University. The paper presented two approaches to robust and accurate face representation, one by learning to recognize high-level describable attributes of faces, and another comparing parts of faces to other faces — making a computational version of what it means to, for example, have “Bette Davis eyes,” as in the song. The work brought together what were at the time new machine learning approaches: big data (the paper introduced a dataset of faces) and the foundational challenge of face recognition in real-world images.

What led you to focus on this problem? What did you hope to achieve?

At the time, researchers exploring recognition tasks in computer vision were trying out new things to recognize, and attributes, which could be flexibly composed, were attractive for scaling the space of what could be recognized beyond simple categories. This paper was one of the first to show that learning to recognize attributes could contribute to improving state-of-the-art accuracy on a widely studied recognition problem.

Was anyone else involved in the research?

This was joint work with a PhD student, Neeraj Kumar; Professor Shree Nayar; and Professor Peter Belhumeur, all at Columbia University at the time.

What was the original response from the research community, and what impact has it had in the years since publication?

This paper has been recognized for a number of reasons. The dataset, called pub-fig, which was curated as part of the research, became an example of how to successfully use attributes to improve recognition accuracy. It also produced some interesting results on how well people do on face verification tasks compared with algorithms.

How has your work evolved since that paper was published? What are you focusing on now?

One line of my work has continued to try to expand the range of targets for recognition research, from the ImageNet Challenge, with a thousand categories, to work on hierarchical recognition, as well as computational approaches to finding entry-level categories and generally connecting language and vision.

What surprised you most about how the research in this field has evolved since you published your paper? What’s been harder or easier than you might have expected? And what do you hope to see in this field in the next few years?

Face recognition has continued to benefit from increased data and improved modeling. Today’s face recognition approaches are surprisingly accurate and robust, to the point where face verification is sometimes used as the primary method to unlock cell phones. Looking back, our work on attributes for faces relied on labels. It would be neat to see this work redone in the era of self-supervision, where an algorithm could identify potential attributes and learn to recognize them without supervision.