APRIL 8, 2021

Casual Conversations Dataset

Casual Conversations dataset is designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of age, genders, apparent skin tones and ambient lighting conditions.

Overview

Casual Conversations is composed of over 45,000 videos (3,011 participants) and intended to be used for assessing the performance of already trained models in computer vision and audio applications for the purposes permitted in our data user agreement. The videos feature paid individuals who agreed to participate in the project and explicitly provided age and gender labels themselves. The videos were recorded in the U.S. with a diverse set of adults in various age, gender and apparent skin tone groups. A group of trained annotators labeled the participants’ apparent skin tone using the Fitzpatrick scale in addition to annotations of whether the videos are recorded in low ambient lighting conditions. Spoken words in all videos were also manually transcribed by human annotators and are available with the dataset.

Casual Conversations Dataset Version 1.0

Casual Conversations dataset is designed to help researchers evaluate their computer vision and audio models for accuracy across certain attributes.


Key Application

Machine learning, ML Fairness

Intended Use Cases

Assist in measuring algorithmic fairness in terms of age, gender, apparent skin tone, ambient lighting conditions, and speech recognition

Primary Data Type

Video (mp4)

Data Function

Testing, training (without using provided annotations)

Dataset Characteristics

Total number of subjects/actors: 3,011

Total number of video recordings: 45,186

Average per video length: ~1 Minute

Labels

Age (self-provided)

3,011

Gender (self-provided)

3,011

Skin Tone (human labelled)

3,011

Lighting (human labelled)

45,186

Speech Transcriptions (human labelled)

45,186

Nature Of Content

Video recordings of individuals, who are asked random questions from a pre-approved list, to provide their “unscripted” answer

Privacy PII

Participants de-identified with unique numbers

License

Limited; see full license language for use

Summary of license permissions

You can evaluate models on the provided labels

You cannot train any model with the provided labels

Access Cost

Open access

Data Collection

Data sources

Vendor data collection efforts

Data selection

All videos are opted-in for data use in ML by the participants

Sampling Methods

Unsampled

Geographic distribution

100% US, cities: Atlanta, Houston, Miami, New Orleans, and Richmond

Labelling Methods

Human Labels

Label types

Human-labels: free-form text labels

Labeling procedure - Human

Participants provided age and gender labels by themselves

Annotators labelled for apparent skin tone, ambient lighting and speech transcriptions

Validation Methods

Human validated

Validator description(s)

Human validated

Validation tasks

Human validators verify labels

Human validators flag PII

Human validators filter data

Validation policy summary

All labels are verified by human validators based in the U.S.

Validators flag any PII content

This dataset features the original video recordings created by Facebook for the Deepfake Detection Challenge (DFDC) dataset. The AI research community can use Casual Conversations as one important stepping stone toward normalizing subgroup measurement and fairness research. With Casual Conversations, we hope to spur further research in this important, emerging field.

If you are an individual who appears in this dataset and would like for your videos to be removed from this dataset, please contact: casualconversations@fb.com