‘Muzlim, K2A, Jih@di, Mull@h’: How Right-Wing Trolls Bypass Hate Speech Filters on Twitter

Upon investigation, many other terms were found to be modified or misspelled to bypass the AI filters, effectively exposing a loophole frequently abused by Hindutva forces.

New Delhi: Indian cricketer Mohammed Shami was subject to a wave of online abuse from right-wing trolls on social media following India’s T20 World Cup loss to Pakistan. While his religious identity singled him out from the rest of the team, the accusations were accordingly “communal” in nature.

Hate-mongering campaigns on social media have seen a string of successes recently, painting an entire community as the “enemy”. In all cases, the targets have inevitably been Muslims, whether it was Shami, Aryan Khan or Kashmiri students after the India-Pakistan match. And Twitter is failing to check online hate, abuse and threats against Muslims more than any other community in India.

This story unfolds as a common one, and begins from a personal experience that escalated into a deeper dive into the world of saffron trolls.

One day after winning a National Film Award for Best Actress, Kangana Ranaut – who is fighting a defamation suit filed against her by lyricist Javed Akhtar – visited the former jail cell of Hindutva ideologue V.D. Savarkar and paid tribute to the Hindutva view of history.

Also read: A Twitter Ban Is a Tough Pill to Swallow, but a Medicine We Need More Of

I posted a tongue-in-cheek tweet about her accelerating proximity to the ruling party, noting that Savarkar’s mercy petitions might offer some inspiration as a solution to one of her troubles.

Within minutes, a right-wing troll pounced on it. The term “mulla” was used in response to the tweet, but was quickly deleted. My phone’s home screen notifications, however, retained it. The tweet (by the same user) that replaced it was even more vile, a circumcision-related slur.

 

A similar reply came from another user.

What struck me was the abbreviation of the hateful communal slur. I decided to check the search bar to see how common this form of spelling is – and found an overwhelming multitude of tweets replicating the term, with almost all the accounts being linked to Hindutva or BJP supporters.

Further scrutiny showed that the abbreviated term was even more frequently used than the original term on Twitter. Or conversely, more tweets using the abbreviation were retained on the platform, bypassing the hate-speech filters – and the “significant improvements” that Twitter promised for curbing of bigotry and hate on its platform.

Upon investigation, many other terms were found to be modified or misspelled to bypass the AI filters and the auto-translation function of the AI support that monitors such tweets, effectively exposing a loophole frequently abused by Hindutva forces.

Many examples, all the same hate

In the tweet mentioned below, rightwing online activist Shefali Vaidya misspelled the word “Muslim” as “Muzlim”. Similarly, the term “Islam” is misspelt as “Izlam”, especially in more provocative tweets. Several Twitter users follow the same tactic to escape suspension of their accounts.

The use of the term sending a Muslim to “72 hoors” as a death threat completely escapes the radar of the platform. Similarly, spellings such as “Jih@di”, “Mull@h” and other clever combinations are also commonplace.

Insulting a Muslim holiday works, too.

“Talibani”, too, is not a red flag term for Twitter. It is often flung at critics of the BJP on Twitter.

 

The right-wing also uses Twitter to attack on the Hindu minority in Bangladesh, or actions of the Taliban in Afghanistan and links these arguments to Indian Muslims with impunity. Another method is posting words in a photo format, usually tailored for WhatsApp forwards in group chats. Most escape the filters.

Podcast: In Last 7 Years, Muslims Have Found Co-existence With Old Neighbours Is Near Impossible

Why it matters, and the backdrop

Surprisingly, the reach of misspelled tweets is not limited, because coordinated, trending hashtags are often used alongside, which allows such tweets to go viral without getting successfully reported.

Twitter has often been at odds with the Modi government (in the BJP’s larger attempt to censor critics and critical media on social media platforms), and has even caved under threat in the past.

India is one of the largest markets for the company, yet, just like Facebook, it allocates only a tiny fraction of its support staff workforce and budget for content moderation. Further, moderation on social media platforms is mostly outsourced to third parties, and working conditions are terrible. Unless a human eye personally assesses each reported tweet, the likelihood of a large amount of religious hatemongering and slurs deliberately misspelled or coded going under the radar is high. For languages other than English, it gets worse. Nor are these tweets automatically flagged by the AI. The generally automated process that informs you that none of Twitter’s policies have been violated by a reported tweet is the rule, not the exception.

False allegations of ‘minority appeasement’ against non-Hindutva politicians and other figures couched in derogatory, apocalyptic language are almost never checked. Online threats to journalists too, get by, as do videos aimed at inciting communal violence (whether or not doctored). Unless a certain tweet calls for violence against a specific community in clear English terms with no linguistic gymnastics, hate mobs are given a free pass.

Given that investigative agencies in India are turning into arms of the BJP government, taking the legal route also appears to be futile, while the same government’s agencies use social media as a tool of surveillance and political persecution, often on grounds of hurting majoritarian religious sentiments. The playing field was not level to begin with.

As the Facebook whistleblower recounted RSS accounts being responsible for hate speech on Facebook in India, it is not difficult for a company like Twitter to identify the root actors and content creators that fuel online mobs committed to spreading hate, divisive propaganda and threats of violence.

If platforms like Facebook and Twitter put in the effort, prioritise sensitising and greatly expanding their support/moderation staff to match the level required for the second-most populous country in the world – online bigotry, threats, majoritarian lumpen mobilisation and the disinformation machine behind them can be greatly curbed.

The author is a Ph.D scholar at Centre for Historical Studies, JNU, and an independent journalist.