Researchers Surprised That With AI, Toxicity is Harder To Fake Than Intelligence (arstechnica.com)
(Wednesday November 12, 2025 @05:40PM (msmash)
from the silver-lining dept.)
Researchers from four universities have released a study revealing that AI models remain easily detectable in social media conversations despite optimization attempts. The team tested nine language models across Twitter/X, Bluesky and Reddit, developing classifiers that identified AI-generated replies at 70 to 80% accuracy rates. Overly polite emotional tone served as the most persistent indicator. The models [1]consistently produced lower toxicity scores than authentic human posts across all three platforms.
Instruction-tuned models performed worse than their base counterparts at mimicking humans, and the 70-billion-parameter Llama 3.1 showed no advantage over smaller 8-billion-parameter versions. The researchers found a fundamental tension: models optimized to avoid detection strayed further from actual human responses semantically.
[1] https://arstechnica.com/information-technology/2025/11/being-too-nice-online-is-a-dead-giveaway-for-ai-bots-study-suggests/
Instruction-tuned models performed worse than their base counterparts at mimicking humans, and the 70-billion-parameter Llama 3.1 showed no advantage over smaller 8-billion-parameter versions. The researchers found a fundamental tension: models optimized to avoid detection strayed further from actual human responses semantically.
[1] https://arstechnica.com/information-technology/2025/11/being-too-nice-online-is-a-dead-giveaway-for-ai-bots-study-suggests/