An algorithm of artificial intelligence developed by a meta-researcher will attempt to reduce the gender gap in Wikipedia articles. Angela Fan, an expert in artificial intelligence, has created a model that can generate women’s biographies from information from the web and write them in encyclopedic format.
Using artificial intelligence, the model searches for relevant information about the person on the Internet, builds a biography and incorporates a citation system that links to sources. According to Fan, the system is a response to the lack of representation on Wikipedia.
Of all the biographies found in the encyclopedia, barely a fifth are of women. A Wikimedia report made it clear that 15% of editors are women, and white men from Europe and North America make up the majority of editors. Wikimedians.
This is important because it influences the publication of biographies and other Wikipedia articles.
How to write a Wikipedia biography using artificial intelligence?
The algorithm captures the relevant information about the person, writes the paragraph, and embeds the quotes to link to the source. Model is based on the structure of a biography on Wikipedia (Early Years, Education, Career, Acknowledgments, etc.) and reproduces each section.
The information is obtained from the content present in Top 10 Google Results. According to the researcher, generating the text by section uses a caching mechanism similar to Transformer-XL, a machine learning model that enables natural language understanding beyond a fixed-length context.
Model it is not the definitive solution to reduce the gender gap, because it has its limits. According to Fan, when evaluating its performance, they found that 68% of the text generated in the biography was not in the reference text.
After reviewing the content, they found that many sentences were partially verifiable, while others – believed to be “hallucinations” – could not be fully verified.
An open source model to close the gender gap
The dataset is open source and includes 1,527 biographies broken down by region and interests. The model represents A starting point for creators and testers to publish more biographies of women in the encyclopedia.
It should be mentioned that the algorithm deals not only with a lack of representation, but also with the Lack of content about prominent women on the web. According to the researcher, the current articles do not contain enough information, or prioritize her personal life over her achievements.
If this model is not modified when writing the original article, the algorithm will learn and reproduce this bad practice.