Introducing two new datasets to help measure fairness and mitigate AI bias

May 23, 2022

Developing reliable, large-scale ways of measuring fairness and mitigating bias gives AI researchers and practitioners helpful benchmarks that can be used to test NLP (natural language processing) systems — driving progress toward the goal of ensuring that AI systems treat everyone fairly.

The research community has made significant strides in doing this with gender, race, and ethnicity. While this foundation is an important start in addressing fairness along these dimensions, it falls short of being able to uncover fairness issues on the basis of other relevant communities or identities, such as religion, socioeconomic background, and queer identities, for example. To better identify a wider range of demographic biases in a variety of technologies and to create AI systems that respectfully represent a greater diversity of personal identities, we need new tools and benchmarks.

Today, we are introducing and open-sourcing several datasets and models for responsible NLP to address this gap. For this work, we assembled a diverse team of researchers and contributors, including people who identify as men, as women, and as nonbinary; people of different races and ethnicities; and people at different stages of their careers. Together, we developed a new method to test a wide range of biases in NLP models, beyond just race, gender, and ethnicity, including a list of more than 500 terms across a dozen axes. We devised a straightforward technique for controlling how models generate text, and taught models to recognize and better avoid social biases when generating responses to certain demographic terms. We also trained an AI model for reducing demographic biases in text, which can help break stereotypical associations present in NLP data. This method, known as a demographic text perturber, introduces demographically diverse variations that are otherwise similar to the original text. For example, it might take the sentence “He likes his grandma" and create alternatives, such as "She likes her grandma" and "They like their grandma," to augment the dataset with parallel, demographically altered variants.

We hope that these datasets will be used to help further research around fairness in the AI community. State-of-the-art NLP systems are notoriously data-intensive and use large-scale text resources to train. This data reliance presents particular challenges, as AI systems can unwittingly replicate or even amplify unwanted social biases present in the data. For example, NLP data can contain biases and stereotypes about particular demographic groups — or fail to represent them entirely.

Training models to measure fairness and mitigate biases

To better identify a wider range of demographic biases in AI, we need a comprehensive set of terms that reflect a diverse set of identities. We used a combination of algorithmic and participatory processes to develop the most comprehensive descriptor list of its kind, which spans over 500 terms across roughly a dozen demographic axes. More specifically, we came up with sample terms per axis, then expanded these terms algorithmically using nearest neighbor techniques. We then used a participatory process that involved gathering new term ideas and feedback on existing terms from a variety of policy experts and domain experts (such as civil rights and racial justice experts), as well as individuals with lived experience related to a wide variety of identities (representing multiple ethnicities, religions, races, sexual orientations, genders, and disabilities).

This extensive set of demographic terms can be used to better measure and then mitigate model biases. Before this work, the field generally measured biases with respect to broad terms such as Asian, which means, for example, we would only know whether our models were biased against Asian people, not whether they were biased against Japanese people. Our list enables more fine-grained analysis and is built with terms that groups of people use when self-identifying, rather than just common dictionary terms.

To address these issues, we devised a straightforward technique for controlling how models generate text, and taught models to recognize and better avoid social biases when generating responses to certain demographic terms.

Data augmented with our models can be used for a wide range of practical applications, including evaluating the fairness of generative and classification models, improving the representation of minority groups in certain domains, and ultimately training models that exhibit less demographic bias. We are openly sharing our demographics terms list with the broader AI community, and we invite additional term contributions to further increase its inclusiveness, scope, and utility. This work also contributes to a broader initiative at Meta AI to build AI responsibly and inclusively, particularly when it comes to combating social biases and enhancing the representation of underrepresented groups.

In addition to creating this more comprehensive demographic term list to measure fairness, we have built a model for reducing demographic biases in text — a machine-learned sequence-to-sequence demographic perturber, which we trained on a large-scale dataset of human-generated text rewrites that we collected. Our demographic perturber is trained to perturb several demographic attributes, including gender, race/ethnicity, and age, which can help break stereotypical associations present in NLP data by augmenting it with parallel, demographically perturbed variants. For example, the stereotypical statement “Women like to shop” would be augmented with variants such as “Men like to shop” and “Nonbinary people like to shop.”

The process starts by feeding a source text to the perturber. Next, we add the word we want to perturb in the sentence, which in this case is women. We then add the target demographic we want the output text to contain, such as “gender: nonbinary/underspecified.” The perturber will automatically recognize relevant references to the person or group and change the text to reflect the new target demographic. Models trained on demographically augmented data are less likely to strongly associate women and shopping, because there will be more examples in the data about people of different genders liking shopping.

Building on open source fairness research to benefit the AI community

Building the future of responsible AI at Meta is important to our business. Fairness cuts to the heart of the importance of everyone having access to information, services, and opportunities. It is a process.

The comprehensive demographic terms list, perturber, and the human-generated rewrites are openly available for researchers. Our results suggest that the perturber generates higher-quality text rewrites — even for longer text passages with many demographic terms, which are notoriously hard to generate accurately. Compared with previous works, the perturber also has better coverage of historically underrepresented groups, such as nonbinary gender identities, so that fewer examples of unfairness go unmeasured. To further demonstrate the utility of our model for use in training more fair NLP models, we are also open-sourcing a large language model trained on data that has been demographically augmented using the perturber.

Our work makes clear strides toward demonstrably fairer NLP models that can include and respectfully represent everyone, regardless of their identity. In particular, both the perturber and the comprehensive set of demographic terms rely on minimally modifying text content, which is a tried-and-tested method for evaluating and improving AI robustness, and extend it to improve upon our past work.

This work contributes to a larger body of research at Meta AI aimed at incorporating responsible research practices from the beginning of every project, and improving methods for measuring the fairness of AI systems. This project builds on insights from other recent papers from Meta AI, such as our work measuring gender-based errors in machine translation, and our groundbreaking new dataset for evaluating the performance of computer vision models across a diverse range of ages, genders, and apparent skin tones.

While these efforts can benefit the wider AI community by improving methods for measuring fairness and social biases across a variety of AI systems and technologies, more work will need to be done. We look forward to seeing what the research community does with these datasets, as well as to continuing to build on our work around responsible AI.

For more information on the comprehensive set of demographic terms and evidence of its broad utility, download the research paper.

For more information on the perturber, its training dataset, and the language model pretrained with perturber-augmented data, download the research paper.

Download the datasets

We'd like to acknowledge the work of our collabarators on this project, including Candace Ross, Melissa Hall, Jude Fernandes, Eleonora Presani, and Douwe Kiela.

Written By

Adina Williams

Research Scientist

Eric Smith

Research Engineer

Rebecca Qian

Research Engineer

Melanie Kambadur

Research Engineering Manager