Spiral of silence and confusion with nicknames: what linguistic difficulties do moderators face?

8 min readSep 15, 2022


Policies of safe spaces are thus concerned with preventing the marginalization of voices already hurt by dominant power relations. This may be implemented through strict no-tolerance policies of “hate speech” or other discussion that would undermine the political project assumed in the space of the community. In practice, this often means that people can be censored or ejected from a space for not properly observing the standards of speech, tone, or style.

(Building a digital girl army: The cultivation of feminist safe spaces online. New Media & Society, 20. Clark-Parsons, 2018)

When we talk about moderation, many questions arise. How strict should it be? And what is more important — freedom of expression or user comfort? After all, restrictions on freedom of expression may also lead to discomfort for those being restricted. And there are also issues at a deeper level.


When moderation rules are defined, an automatic system is configured and human experts are there to read comments and check to ensure that rules are being followed, but how can cognitive distortions can be avoided? What can such distortions lead to? And on the other hand there is also the question of how users feel in actively moderated communities.

Language, distortions and AI

First, a little theory. Language is simply a means of access to cognitive processes. In other words, it is language in particular that preserves human experience and cognition.

What does this mean? Well, it means that on the one hand language captures and holds the entire system of human knowledge and thinking, and on the other hand it captures and holds the particular active speaker/user. Each person has his/her own experience and individual set of signs, explaining why at times we do not understand each other. There are just as many opinions and ways of thinking as there are people — and also just as many cognitive distortions. Just have a go at trying to model them all! This prevalence of cognitive distortions makes AI a far cry from how it is portrayed in the movies.

It seems that any attempt to embed all possible variations into AI code is doomed to failure. This is why despite all their progress, online translation tools continue to be imperfect. We always mean something that needs to be understood in context, and this context can be non-verbal and situational.

©Getty Images

Much is written by both researchers and columnists about the ethical problems of such “highly developed yet imperfect” AI. Sometimes systems distortions can result in serious consequences, though they were originally developed in order to help us avoid these very consequences. After all, computers have none of their own understanding of ethics and none of their own feeling for the human factor. But the AI developer does. And even if a developer plans to take into account the maximum number of cognitive distortions when writing his software, his product will nevertheless be affected by others which he has failed to take into account. It’s literally impossible for a developer to take everything into account — for example, how can he take into account his very own perception?

It’s all about Context

Automatic moderation systems are also based on AI. Sometimes this is simpler AI, and at other times it is more complex and trainable. For pre-moderation, the cognitive side of language becomes the main stumbling block. The issue comes to a head at the moment when a system encounters meaning: a single word can have different semantic shades which are dependent entirely upon context.

Mexican striker Javier Hernandez wears his nickname ‘Chicharito’ on his shirt

But here we also need to understand our approach to the word ‘context’. When we spoke about the process of moderating various sporting events, we mentioned FIFA’s favorite example of a football player named Gorilla and how his fans use the appropriate emoji. What creates the context in this situation? The Italian national team and the club where this player plays? Of course. But the emotional and cognitive tone of the message — and even those of previous messages — also form the context. How can an automatic moderation tool “understand” which meaning a user intends for a given emoji? What is the emotional tone in each case? What does it refer to? Again, the gorilla example became so interesting precisely within the context of a match between Italy and England, during which fans made a critically high number of racist comments, meaning that a single word or emoji in the course of two or three messages might, quite possibly, have exactly the opposite meaning.

And it’s not only with Gorilla that the issue arises. As for player nicknames that may create issues, some of our personal favourites are Divine Ponytail, The Butcher of Bilbao, Baby Horse and Kaiser. Even if there is no context that makes them offensive, such an element of fantasy makes it harder for the system to understand context around their use.

In what cases do animal names become insulting? When do physical descriptions turn into sexist statements? And when does the short-hand for the name Richard become a swear word? Can a moderation system really recognize these nuances?

Some AI systems, such as the Toxic Mod System, flag potentially toxic comments, lower their priority in search results, and mark them in the admin panel, allowing live moderators to make the final decision on how to deal with them. Nevertheless, there are two problems with such an approach. First, the system cannot be trained, and second, it lacks independence. It cannot do without a live moderator and fails to solve problems in real time.

In this regard, classical pre-moderation systems might be much more efficient — they do not let toxic messages through, and their difficulty is finding the proper level of strictness which allows users to make a misprint in words like ‘shiitake’, relate tales of something read in childhood without requiring them to write in the Queen’s English, or mention a friend named Dick without the risk of being banned from the system. We also need deep analysis of the methods applied by users in order to bypass moderation, such as the use neologisms. Thread crapping, for example, cannot be recognized by automatic systems, in principle. When a user persistently writes about the sale of apples in a chat dedicated to a Premier League match, the neural network has no way of catching this fact.

Here at Watchers, after the introduction of a three-tier system of moderation, the number of messages containing negative content decreased from 11 to 4%.

The use of linguistically neutral pronouns is also a complex case, because they carry contextual superstructures that cannot be identified by formal features. Anna Gibson, author of “Free Speech and Safe Spaces: How Moderation Policies Shape Online Discussion Spaces”, says that as individuals become more identified with a group, they move from first person singular pronouns to plural pronouns.

Does this indicate that a soft filter on automatic moderation is better than a hard one? Especially if the service has human moderators who review complaints and control the flow of messages?

Yes, this turns out to be the case. Human moderators, however, don’t solve the primary issue of cognitive distortion. If the problem with automatic moderation is that it simply fails to ‘notice’ nuances of meaning and treats everyone with the same brush, then human specialists have the problem of still being human, which means they may perceive context incorrectly, read detail in or out of a situation, and apply personal bias vis-à-vis what should be banned and what should be allowed.

©FIFA by Getty

What is it that helps us avoid such distortions in chat? At Watchers, we find that clearly formulated rules and a combination of different approaches within a single online community. With this approach, the system insures the person and vice versa, and users also participate in the moderation process. User’s themselves don’t influence the creation of rules for the community as a whole, but do their own ‘polishing’ to ensure they feel comfortable in the chat/thread. This they can achieve by hiding unpleasant messages or making particular participants invisible.

Fully-free communities and self-censorship

It might seem that if there is no moderation within a community, allowing absolutely free speech to flourish, then users will feel free and maximally able to engage in self-expression. But in fact such things as self-censorship and a spiral of silence can kick in, making this less than true. When people find themselves part of a minority — even in an online community — they tend to hide their point of view so as not to be ostracized.

There is no system moderation on Reddit, in which all moderation rules are set by authors as part of particular subreddits.

It is believed that if a community is built on the principles of anonymity, then the spiral of silence is less of a problem, meaning that the tendency to self-censor is also weaker. In actual fact, this is not quite true. An online community is still a community, even though users don’t know anything about each other’s real lives. Reluctance to be ostracized is extremely high, even in cases where a social group is new for the user.

For those belonging to a marginalized group or a community that has long been faced with discrimination, the spiral of silence and self-censorship are especially significant in terms of their effect.

Here we go back to the cognitive undertones of messages/comments. Subtleties of meaning, hidden intentions, and, of course, context within a chat, can all convince people that they need to refrain from participating in conversation. Certain words and expressions that demonstrate the attitude of the online community to a particular phenomenon may emphasize to the user that he/she is part of ‘the minority’ here.

This issue also arises due to the very pronouns we use, as we mentioned above — these become expressive of the group identity, even if this opinion has actually only been expressed by one participant. “We make something great again”, “we are against something”…these are expressions of collective opinion, even when written by a single participant. Perceived as such, they can prevent members from minorities experiencing discrimination from posting in the chat/thread.

This is why it is important, even in completely anonymous communities with an emphasis on freedom of speech, to evaluate messages from a cognitive point of view, independent of whether they are statements, suggestions, assumptions or discriminatory manifestos. If necessary, such material can be pessimized. This helps facilitate the creation of a comfortable environment for everyone — in which all participants can freely express their standpoints and actively even join in the experience of co-creating a comfortable environment.

Here you’ll find our article about trusted and abusive online spaces, and how we moderate chats on different levels.

If you want to learn more about our approach to moderation, or to integrate our solution into your platform, сontact us here, or write email: hello@watchers.io




How to build communities around the content platform? We know the answer. Learn more: watchers.io