Researchers astonished that with AI, replicating toxicity is more difficult than imitating intelligence.

by admin November 7, 2025

written by admin November 7, 2025 0 comments

Researchers astonished that with AI, replicating toxicity is more difficult than imitating intelligence.

The next time you receive an unusually courteous response on social media, it may be worth a second glance. It might be an AI model attempting (and failing) to fit in.

On Wednesday, investigators from the University of Zurich, University of Amsterdam, Duke University, and New York University published a study showing that AI models can still be easily told apart from humans in social media discussions, with an excessively friendly emotional tone acting as the most consistent giveaway. The study, which evaluated nine open-weight models across Twitter/X, Bluesky, and Reddit, revealed that classifiers created by the researchers identified AI-generated responses with 70 to 80 percent accuracy.

The research proposes what the authors refer to as a “computational Turing test” to measure how closely AI models mimic human language. Rather than depending on subjective human evaluation of text authenticity, the framework employs automated classifiers and linguistic analysis to pinpoint specific traits that differentiate machine-generated content from that created by humans.

“Even after adjustments, LLM outputs are still distinctly recognizable from human text, especially in terms of affective tone and emotional expression,” the researchers noted. The team, spearheaded by Nicolò Pagan at the University of Zurich, experimented with various optimization techniques, from basic prompting to fine-tuning, but found that deeper emotional cues continue to serve as reliable indicators that a specific online text interaction was produced by an AI chatbot instead of a human.

The toxicity indicator

In the study, researchers assessed nine large language models: Llama 3.1 8B, Llama 3.1 8B Instruct, Llama 3.1 70B, Mistral 7B v0.1, Mistral 7B Instruct v0.2, Qwen 2.5 7B Instruct, Gemma 3 4B Instruct, DeepSeek-R1-Distill-Llama-8B, and Apertus-8B-2509.

When asked to create responses to genuine social media posts from real users, the AI models found it challenging to replicate the level of casual negativity and unprompted emotional expression typical in human social media entries, with toxicity scores consistently lower than those of authentic human replies across all three platforms.

To address this shortcoming, the researchers explored optimization methods (including providing writing examples and utilizing context retrieval) aimed at reducing structural differences like sentence length or word count, but discrepancies in emotional tone remained. “Our extensive calibration tests question the notion that more advanced optimization inherently results in more human-like output,” the researchers concluded.

Our Company

About Links

Useful Links

Newsletter

Latest Posts

Researchers astonished that with AI, replicating toxicity is more difficult than imitating intelligence.

The toxicity indicator

World of Warcraft is receiving a new type of counterfeit currency.

Halo Infinite is on the verge of receiving its final significant update.

You may also like

Leave a Comment Cancel Reply

Our Company

About Links

Useful Links

Newsletter

Latest Posts