Casual Conversations dataset is designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of age, genders, apparent skin tones and ambient lighting conditions.
Casual Conversations is composed of over 45,000 videos (3,011 participants) and intended to be used for assessing the performance of already trained models in computer vision and audio applications for the purposes permitted in our data user agreement. The videos feature paid individuals who agreed to participate in the project and explicitly provided age and gender labels themselves. The videos were recorded in the U.S. with a diverse set of adults in various age, gender and apparent skin tone groups. A group of trained annotators labeled the participants’ apparent skin tone using the Fitzpatrick scale in addition to annotations of whether the videos are recorded in low ambient lighting conditions. Spoken words in all videos were also manually transcribed by human annotators and are available with the dataset.
Casual Conversations dataset is designed to help researchers evaluate their computer vision and audio models for accuracy across certain attributes.
Machine learning, ML Fairness
Assist in measuring algorithmic fairness in terms of age, gender, apparent skin tone, ambient lighting conditions, and speech recognition
Video (mp4)
Testing, training (without using provided annotations)
Total number of subjects/actors: 3,011
Total number of video recordings: 45,186
Average per video length: ~1 Minute
Labels
Age (self-provided)
3,011
Gender (self-provided)
3,011
Skin Tone (human labelled)
3,011
Lighting (human labelled)
45,186
Speech Transcriptions (human labelled)
45,186
Video recordings of individuals, who are asked random questions from a pre-approved list, to provide their “unscripted” answer
Participants de-identified with unique numbers
Limited; see full license language for use
Summary of license permissions
You can evaluate models on the provided labels
You cannot train any model with the provided labels
Open access
Data sources
Vendor data collection efforts
Data selection
All videos are opted-in for data use in ML by the participants
Unsampled
Geographic distribution
100% US, cities: Atlanta, Houston, Miami, New Orleans, and Richmond
Human Labels
Label types
Human-labels: free-form text labels
Labeling procedure - Human
Participants provided age and gender labels by themselves
Annotators labelled for apparent skin tone, ambient lighting and speech transcriptions
Human validated
Human validated
Human validators verify labels
Human validators flag PII
Human validators filter data
All labels are verified by human validators based in the U.S.
Validators flag any PII content
This dataset features the original video recordings created by Facebook for the Deepfake Detection Challenge (DFDC) dataset. The AI research community can use Casual Conversations as one important stepping stone toward normalizing subgroup measurement and fairness research. With Casual Conversations, we hope to spur further research in this important, emerging field.
If you are an individual who appears in this dataset and would like for your videos to be removed from this dataset, please contact: casualconversations@fb.com
Foundational models
Latest news
Foundational models