Mean words machine? Researchers reveal racial bias in social media ‘hate speech’ detector
Several researchers participated in analyzing two sets of data consisting of tweets that had been flagged by PerspectiveAPI for “toxic language.”
PerspectiveAPI, a tool used to detect inflammatory language across social media, appears to be inherently biased, as researchers uncovered that it flags many non-offensive comments often coming from the very minorities that it ostensibly seeks to p
Researchers at the University of Washington conducted a study on “The Risk of Racial Bias in Hate Speech Detection,” analyzing the PerspectiveAPI algorithm, a tool that aims to rid the internet of supposed “hate speech.” The study comes just months after researchers at the University of Washington-Tacoma began developing a tool designed to scan for online “hate speech.”
The computer science and engineering researchers found in their study that the hate speech detection algorithm rated casual greetings given by white people as far less toxic than those made by black people. This translated to “what’s up, bro!” and “I saw him yesterday” getting toxicity ratings of seven and six percent, respectively, while “wussup, n*gga!” and “I saw his ass yesterday” received scores of 90 and 95 percent, respectively.
The two-data-set study defines “toxic language” as any speech that “primarily targets members of minority groups” and incites “real-life violence towards them.” The study focuses on statements made on Twitter.
[RELATED: Disturbing number of students say hate speech is not free speech]
African America English, slang deemed acceptable between black people, turned out to be the primary culprit of this PerspectiveAPI censorship, according to a UW news release.
According to the first data set and release, more than 46 percent of the AAE tweets were flagged as “offensive,” compared with nine percent of tweets in general American English. The second study yielded a similar pattern, falsely flagging 26 percent of AAE language as “abusive” and only five percent of general American English tweets as such.
After analyzing several models, demographics, and data sets, researchers began working toward a solution to the algorithm.
Workers with Amazon Mechanical Turk took part in a controlled experiment where they were asked to rationalize tweets as offensive to them and anyone else. As a result, the study states that “priming workers to think about dialect and race makes them significantly less likely to label an AAE tweet as (potentially) offensive to anyone. Additionally, race priming makes workers less likely to find AAE tweets offensive to them.”
[VIDEO: Ban Trump’s Twitter? Students react]
“We find strong evidence that extra attention should be paid to the confounding effects of dialect so as to avoid unintended racial biases in hate speech detection,” the researchers wrote.
In addition to these findings, interpreters tended to label tweets as offensive to others more than they labeled them as offensive to themselves. The individuals did not want to seem discriminatory. This had a potential influence on other decision-makers as well.
”Our work serves as a reminder that hate speech and toxic language is highly subjective and contextual,” study co-author Maarten Sap said in the news release. “We have to think about dialect, slang and in-group versus out-group, and we have to consider that slurs spoken by the out-group might actually be reclaimed language when spoken by the in-group.”
Campus Reform reached out to the Electronic Frontier Foundation to determine the possible consequences such tools could have on digital rights but received no comment in time for publication.
[RELATED: Twitter recruits profs to fight ‘incivility and intolerance’]
Campus Reform also reached out to the researchers of this study to ask if they felt that this PerspectiveAPI tool might be weaponized against free speech in the future. Researcher Yejin Choi declined to comment and the other four co-authors did not respond in time for publication.
It’s unclear if the UW-Tacoma researchers who are currently developing a tool to detect online “hate speech” will consider this latest study.
Neither of the researchers conducting that study responded in time for publication.
Follow the author of this article on Twitter: @addison_smith49