Research

Computer Vision

Creating a dataset and a challenge for deepfakes

September 05, 2019

Data sets and benchmarks have been some of the most effective tools to speed progress in AI. Our current renaissance in deep learning has been fueled in part by the ImageNet benchmark. Recent advances in natural language processing have been hastened by the GLUE and SuperGLUE benchmarks.

“Deepfake” techniques, which present realistic AI-generated videos of real people doing and saying fictional things, have significant implications for determining the legitimacy of information presented online. Yet the industry doesn't have a great dataset or benchmark for detecting them. We want to catalyze more research and development in this area and ensure that there are better open source tools to detect deepfakes. That’s why Facebook, the Partnership on AI, Microsoft, and academics from Cornell Tech, MIT, University of Oxford, UC Berkeley, University of Maryland, College Park, and University at Albany-SUNY are coming together to build the Deepfake Detection Challenge (DFDC).

The goal of the challenge is to produce technology that everyone can use to better detect when AI has been used to alter a video in order to mislead the viewer. The Deepfake Detection Challenge will include a dataset and leaderboard, as well as grants and awards, to spur the industry to create new ways of detecting and preventing media manipulated via AI from being used to mislead others. The governance of the challenge will be facilitated and overseen by the Partnership on AI’s new Steering Committee on AI and Media Integrity, which is made up of a broad cross-sector coalition of organizations including Facebook, WITNESS, Microsoft, and others in civil society and the technology, media, and academic communities.

 Deepfake Detection Challenge

It’s important to have data that is freely available for the community to use, with clearly consenting participants, and few restrictions on usage. That's why Facebook is commissioning a realistic dataset that will use paid actors, with the required consent obtained, to contribute to the challenge. No Facebook user data will be used in this dataset. We are also funding research collaborations and prizes for the challenge to help encourage more participation. In total, we are dedicating more than $10 million to fund this industry-wide effort.

To ensure the quality of the dataset and challenge parameters, they will initially be tested through a targeted technical working session this October at the International Conference on Computer Vision (ICCV). The full dataset release and the DFDC launch will happen at the Conference on Neural Information Processing Systems (NeurIPS) this December. Facebook will also enter the challenge but not accept any financial prize. Follow our website for regular updates.

This is a constantly evolving problem, much like spam or other adversarial challenges, and our hope is that by helping the industry and AI community come together we can make faster progress.

We’ve asked outside experts to share their perspectives on this project, and we're including their responses below.

Academic support

“In order to move from the information age to the knowledge age, we must do better in distinguishing the real from the fake, reward trusted content over untrusted content, and educate the next generation to be better digital citizens. This will require investments across the board, including in industry/university/NGO research efforts to develop and operationalize technology that can quickly and accurately determine which content is authentic.” — Professor Hany Farid, Professor in the Department of Electrical Engineering & Computer Science and the School of Information, UC Berkeley

“People have manipulated images for almost as long as photography has existed. But it's now possible for almost anyone to create and pass off fakes to a mass audience. The goal of this competition is to build AI systems that can detect the slight imperfections in a doctored image and expose its fraudulent representation of reality.” — Antonio Torralba, Professor of Electrical Engineering & Computer Science and Director of the MIT Quest for Intelligence

“As we live in the multimedia age, having information with integrity is crucial to our lives. Given the recent developments in being able to generate manipulated information (text, images, videos, and audio) at scale, we need the full involvement of the research community in an open environment to develop methods and systems that can detect and mitigate the ill effects of manipulated multimedia. By making available a large corpus of genuine and manipulated media, the proposed challenge will excite and enable the research community to collectively address this looming crisis.” — Professor Rama Chellappa, Distinguished University Professor and Minta Martin Professor of Engineering, University of Maryland

“To effectively drive change and solve problems, we believe it's critical for academia and industry to come together in an open and collaborative environment. At Cornell Tech, our research is centered around bridging that gap and addressing technology's societal impact in the digital age, and the Deepfake Detection Challenge is a perfect example of this. Working with tech industry leaders and academic colleagues, we are developing a comprehensive data source that will enable us to identify fake media and ultimately lead to building tools and solutions to combat it. We’re proud to be a part of this group and to share the data source with the public, allowing anyone to learn from and expand upon this research.” — Serge Belongie, Associate Dean and Professor, Cornell Tech.

“Manipulated media being put out on the internet, to create bogus conspiracy theories and to manipulate people for political gain, is becoming an issue of global importance, as it is a fundamental threat to democracy, and hence freedom. I believe we urgently need new tools to detect and characterize this misinformation, so I am happy to be part of an initiative that seeks to mobilize the research community around these goals — both to preserve the truth whilst pushing the frontiers of science.” — Professor Philip H. S. Torr, Department of Engineering Science, University of Oxford

“Although deepfakes may look realistic, the fact that they are generated from an algorithm instead of real events captured by camera means they can still be detected and their provenance verified. Several promising new methods for spotting and mitigate the harmful effects of deepfakes are coming on stream, including procedures for adding ‘digital fingerprints’ to video footage to help verify its authenticity. As with any complex problem, it needs a joint effort from the technical community, government agencies, media, platform companies, and every online users to combat their negative impact.” — Professor Siwei Lyu, College of Engineering and Applied Sciences, University at Albany-SUNY

“Technology to manipulate images is advancing faster than our ability to tell what’s real from what’s been faked. A problem as big as this won’t be solved by one person alone. Open competitions like this one spur innovation by focusing the world’s collective brainpower on a seemingly impossible goal.” — Phillip Isola, Bonnie & Marty (1964) Tenenbaum CD Assistant Professor of Electrical Engineering & Computer Science, MIT

Written By

Mike Schroepfer

Chief Technology Officer