MARCH 9, 2023

Casual Conversations v2 Dataset

Casual Conversations dataset version 2 is designed to help researchers evaluate their computer vision, audio and speech models for accuracy across a diverse set of ages, genders, language/dialects, geographies, disabilities, physical adornments, physical attributes, voice timbres, skin tones, activities, and recording setups.


Casual Conversations v2 is composed of over 5,567 participants (26,467 videos) and intended mainly to be used for assessing the performance of already trained models in computer vision and audio applications for the purposes permitted in our data license agreement. The videos feature paid individuals who agreed to participate in the project and explicitly provided Age, Gender, Language/Dialect, Geo-location, Disability, Physical adornments, Physical attributes labels themselves. The videos were recorded in Brazil, India, Indonesia, Mexico, Philippines, United States, and Vietnam with a diverse set of adults in various categories. A group of trained annotators labeled the participants’ apparent skin tone using the Fitzpatrick scale and Monk Scale, in addition to annotations of Voice timbre, Activity and Recording setups. Spoken words in all videos are either scripted (a sample paragraph from The Idiot by Fyodor Dostoevsky provided with the dataset) or nonscripted (answering one of five predetermined questions).

Casual Conversations Dataset Version 2.0

Casual Conversations v2 dataset is designed to help researchers evaluate their computer vision, audio and speech models for accuracy across certain attributes.

Key Application

Machine learning, ML Fairness and Robustness

Intended Use Cases

Assist in measuring algorithmic fairness and robustness in terms of age, gender, apparent skin tone, language/dialect, geo-location, disability, physical adornment, physical attributes, voice timbre, activity/recording setup conditions.

Primary Data Type

Video (mp4)

Data Function

Testing, training for certain categories (without using provided annotations)

Dataset Characteristics

Total number of subjects/actors: 5,567

Total number of video recordings: 26,467

Average per video length: ~1 Minute


Age (self-provided)


Gender (self-provided)


Language/Dialect (self-provided)


Geo-location (self-provided)


Disability (self-provided)


Physical adornment (self-provided)


Physical attributes (self-provided)


Voice timbre (human labeled)


Apparent skin tone (human labeled)


Activity (human labeled)


Recording setup (human labeled)


Nature Of Content

Video recordings of individuals, who are asked predetermined questions from a pre-approved list, to provide their nonscripted answer as well as video recordings of their reading from a scripted text

Privacy PII

Participants de-identified with unique numbers


Limited; see full license language for use

Summary of license permissions

You can evaluate models on the provided labels

You can only train your model on certain labels - refer to license permissions

Access Cost

Open access

Data Collection

Data sources

Vendor data collection efforts

Data selection

Human validators flagged personally identifiable information (PII)

All videos are provided by the participants for the purpose of creating this dataset.

Sampling Methods


Geographic distribution

Brazil, India, Indonesia, Mexico, Philippines, United States, Vietnam

Labelling Methods

Human Labels

Label types

Human-labels: free-form text labels

Labeling procedure - Human

Participants provided age, gender, language, disability, geo-location, physical adornment and attributes labels

Annotators labeled for apparent skin tone, voice timbre, activity/recording setups

Validation Methods

Human validated

Validator description(s)

Human validated

Validation tasks

Human validators verify labels

Human validators flag PII

Human validators filter data

Validation policy summary

All labels are verified by human validators

Validators flag any PII content

The AI research community can use Casual Conversations v2 as one important step toward promoting fairness and robustness research. With Casual Conversations v2, we hope to spur further research in this important, emerging field.

If you are an individual who appears in this dataset and would like for your data to be removed from this dataset, please contact: