Wikipedia is often the first website many people visit when looking for biographical information about important figures, but not everyone is represented equally on the site. Only 20 percent of biographies on Wikipedia are about women. This imbalance can have far-reaching consequences. Wikipedia has long been used as a source of data in natural language processing (NLP) tasks, and this gender bias can affect machine learning models trained using the site. On a more human level, this bias can impact young students who are looking through Wikipedia to learn about history and choose subjects for their class assignments.
With Generating Biographies, artificial intelligence can serve as a starting point for Wikipedia article editors who are working to reduce bias and bring more representation to the site. The model generates biographies for marginalized communities, focusing on women in science, women in Asia, and women in Africa.
Research Scientist, Meta AI "When I was in school, I wanted to write a biography about Eleanor Roosevelt, and I remember thinking, 'Okay, there's a lot of books, but there's mainly books about men.' That stayed with me throughout life."
Localization Editor, Meta “Everyone can edit Wikipedia, everyone can bring their contribution. And that's the whole strength of the platform, as much as it is a way to have testimonials of how the world was, is, and will be.”
Our researchers understand the importance of having accurate, high-quality information available online, from high school students needing to write reports for class to NLP models being trained on Wiki articles. But when most Wikipedia biographies are about men, women and non-binary people are diminished despite their enormous impact throughout history. That’s why the Generating Biographies team has open-sourced an AI model that automatically creates biographical articles about important real-world public figures, along with a novel dataset to evaluate model performance on real biographies of women from historically marginalized groups. Our team hopes this will enable other researchers to push the model forward so AI-generated entries can be used as a starting point for human writers to publish more biographies of underrepresented groups.
Generating Biographies is a model that searches websites for accurate information and drafts a Wikipedia-style entry about that person, complete with citations. The method starts with the subject and occupation of the biography, leveraging web searches to find relevant evidence. A retrieval-augmented generation architecture is then employed based on large-scale pre-training to identify relevant information and generate the biography.
After each generated sentence, a citation is appended based on which web searches were retrieved. This citation module is what builds the bibliography linking back to the sources that were used. The process repeats with each section predicting the next, covering all of the elements that make up a biography so the generated article looks like a real Wikipedia article. A novel dataset of Wikipedia biographies about women is then used to evaluate the quality of the generated text.
Angela Fan and Claire Gardent