Case Studies

Using Prolific to help detect deepfakes

March 7, 2023
Woman-headshot

How well can humans detect deepfakes? Kimberly Mai, a PhD candidate at UCL, conducted this research to find out.

The task 

Machine learning and AI is growing more popular. So, understanding human responses to AI has never been so integral. The particular focus of this research was deepfakes and how humans respond to such media. 

Deepfakes are a type of synthetic media used to create convincing images, videos and audio in the likeness of another person. This powerful technology has the potential to be both revolutionary and dangerous. In theory, you would be able to make anyone in the whole world say whatever you want, no matter how incendiary. 

This research would show participants randomized audio clips and then ask if and why they believed the clip to be real or a deepfake. The research aimed to track human perception of deepfakes and delve into the reasons that led them to believe a clip was fake.

The challenge 

Most machine learning research uses publicly available data, but this research was different as they wanted to use human data. They needed to find a platform that would allow them to easily and confidently collect high quality data. For example, this research relied on free text responses, where participants were asked their opinions on the clips they had watched. Responses had to be detailed, meaning that participants had to be engaged and willing to put in the work. 

Also, this research encompassed two separate experiments, one for fluent English speakers and one for fluent Chinese speakers. So, it was integral that comprehensive filtering was available.

The solution 

The decision to use Prolific was made mainly as a result of its high data quality and participant engagement. Following recommendations from colleagues who had previously used Prolific, it became clear that they believed Prolific to be easy to use, simple to learn, cost-effective and had the highest data quality in comparison to other platforms like Amazon Mechanical Turk. 

The researchers were able to easily filter by fluency in their chosen language and be sure that these participants were engaged, trustworthy, and willing to write comprehensively about their experiences.

The results

The results of this research provided a baseline for how well humans can detect speech inconsistencies. Participants were able to identify words or phrases pronounced strangely, or when pitch and intonation was unusual. If they saw these inconsistencies, clips would be labeled as a deepfake. 

Therefore, the outcomes of this research, including the rationale participants used to spot deepfakes, can be used to further improve machine learning detection models. The human performance results can be used as a benchmark which can be compared against machine learning detection models.