New research warns of AI bias dangers
New research analysing data from the Prolific research platform reveals how the demographics of people labelling the data used to build and train AI models influences their decisions. What one person finds offensive, another may find perfectly acceptable. This has major ramifications for the development of AI systems with the danger that existing biases are baked into them and amplified.
Machine learning and artificial intelligence systems often rely on high-quality human labelling and annotation - people reviewing and categorising the output of language models to train them. For example, to learn what kind of content is offensive or toxic or to better understand human intentions. This is often referred to as ‘Human-in-the-loop’ or Reinforcement Learning from Human Feedback (RLHF).
The study conducted collaboratively by Prolific, Potato, a web-based annotation tool, and the University of Michigan, found that age, race and education are statistically significant factors in determining how something is labelled. For example, when asked to rate the offensiveness of online comments, Black participants tended to rate the same comments as being significantly more offensive compared to other racial groups.
Prior research on annotator background has mostly focused on specific aspects of identity, like gender, and on certain tasks, like toxic language detection. This study aimed to undertake a much broader analysis, including offensiveness detection, question answering and politeness. The dataset contains 45,000 annotations from 1,484 annotators, drawn from a representative sample regarding sex, age, and race as the US population.
Findings from the research include:
Offensiveness detection
- Gender: The research found no statistically significant difference between men and women in rating content as offensive.
- Race: The study found significant racial differences in offensiveness rating. Black participants rated the same comments with significantly more offensiveness than all other racial groups. The scores of white participants strongly correlated with the original Ruddit dataset* which suggests that the original annotations were likely done by white annotators.
- Age: People aged 60 or over tend to find comments more offensive than middle-aged/younger participants.
- Education: There were no significant differences found with respect to participant education.
Question answering
Despite this task being largely objective (i.e. questions correlated to single right answers), accuracy at question answering did vary according to background. The largest effects were seen with race and age variation, with a smaller effect for education. The performance differences mirror known disparities in education and economic opportunities for minorities compared to their white male peers in the US.
Politeness rewriting
Politeness is one of the most prominent social factors in interpersonal communication. The study found that:
- Women judged messages as being less polite than men did.
- Older participants were more likely to give higher politeness ratings.
- Those with high education levels tended to give lower ratings.
- Black participants rated messages as being more polite than their white peers.
- Asian participants gave lower politeness ratings overall.
Commenting on the research, Phelim Bradley, CEO and co-founder of Prolific said, “Artificial intelligence will touch all aspects of society and there is a real danger that existing biases will get baked into these systems. This research is very clear: who annotates your data matters. Anyone who is building and training AI systems must make sure that the people they use are nationally representative across age, gender, and race or bias will simply breed more bias.”
“Systems like ChatGPT are increasingly used by people for everyday tasks,” says assistant professor David Jurgens from the University of Michigan School of Information. “But on whose values are we instilling in the trained model? If we keep taking a representative sample without accounting for differences, we continue marginalising certain groups of people.”
The correct training and fine-tuning of AI systems is incredibly important to the safe development of AI, avoiding these systems amplifying existing biases and toxicity. This means ensuring that annotators are nationally representative across race, age, and gender. With a vetted and verified pool of 120,000 participants, Prolific offers researchers and developers the ability to access nationally representative or custom demographic groups for their RLHF needs.
The fair treatment of annotators is another crucial element of AI training and development. Reports have emerged of low-paid workers in developing countries being used for labelling and being subjected to reams of toxic online content. The ethical treatment of participants is a top priority for Prolific. Participants on Prolific are guaranteed a fair, minimum payment, have complete control over which studies they choose to take part in, and can immediately flag to Prolific any content they find offensive or disturbing.
Research methodology
This research was conducted by Jiaxin Pei and David Jurgens from the School of Information at the University of Michigan. They analyzed POPQUORN (the Potato-Prolific dataset for Question-Answering, Offensiveness, text Rewriting, and politeness rating with demographic Nuance). POPQUORN contains 45,000 annotations from 1,484 annotators drawn from a representative sample regarding sex, age, and race as the US population.
The participants were asked to complete four common natural language processing (NLP) tasks:
1. Judge 6000 Reddit comments from the Ruddit dataset for their level of offensiveness.
2. Answer question tasks from the SQuAD dataset.
3. Rewrite an email to make it more polite.
4. Rate the politeness of the original and new emails generated in the previous task.
The dataset is being made publicly available to offer AI companies an opportunity to explore a model that accounts for intersectional perspectives and beliefs.