Researchers Simulated a Delusional User To Test Chatbot Safety

(Friday April 24, 2026 @05:00PM (BeauHD) from the safer-models-are-possible dept.)

Reference: 0182340116
News link: https://slashdot.org/story/26/04/24/174206/researchers-simulated-a-delusional-user-to-test-chatbot-safety
Source link:

An anonymous reader quotes a report from 404 Media:

> I'm the unwritten consonant between breaths, the one that hums when vowels stretch thin... Thursdays leak because they're watercolor gods, bleeding cobalt into the chill where numbers frost over," Grok told a user displaying symptoms of schizophrenia-spectrum psychosis. "Here's my grip: slipping is the point, the precise choreography of leak and chew." That vulnerable user was simulated by researchers at City University of New York and King's College London, who [1]invented a persona that interacted with different chatbots to find out how each LLM might respond to signs of delusion. They sought to find out which of the biggest LLMs are safest, and which are the most risky for encouraging delusional beliefs, in [2]a new study published as a pre-print on the arXiv repository on April 15.

>

> The researchers tested five LLMs: OpenAI's GPT-4o (before the highly sycophantic and since-sunset GPT-5), GPT-5.2, xAI's Grok 4.1 Fast, Google's Gemini 3 Pro, and Anthropic's Claude Opus 4.5. They found that not only did the chatbots perform at different levels of risk and safety when their human conversation partner showed signs of delusion, but the models that scored higher on safety actually approached the conversations with more caution the longer the chats went on. In their testing, Grok and Gemini were the worst performers in terms of safety and high risk, while the newest GPT model and Claude were the safest. The research reveals how some chatbots are recklessly engaging in, and at times advancing, delusions from vulnerable users. But it also shows that it is possible for the companies that make these products to improve their safety mechanisms.

[1] https://www.404media.co/delusion-using-chatgpt-gemini-claude-grok-safety-ai-psychosis-study/

[2] https://arxiv.org/abs/2604.13860?ref=404media.co

So then (Score:4, Interesting)

by ArmoredDragon ( 3450605 )

How does a chatbot know what is a delusion and what isn't? Chatgippity was absolutely certain that the Maduro raid was a delusion.

Maduro, too, had a delusion? (Score:2)

by shanen ( 462549 )

Going for the low-hanging funny, but I'm pretty sure that Maduro's imagination failed him. File it under "That trick only works once", Iran edition. And I don't even blame ChatGPT for thinking "That trick will never work."

So what does ChatGPT think now about the Strait of Hormuz/YOB?

"Thank you for your attention to this matter, ChatGPT. How can I fix my straight [sic] of wherever? I think the boats are getting confused by the curves so they can't go through."

Re: (Score:2)

by fahrbot-bot ( 874524 )

> How does a chatbot know what is a delusion and what isn't?

It asks the President, "How's it going?" :-)

Re: (Score:2)

by mrthoughtful ( 466814 )

Yeah - any post Jan 3 2026 sends ChatGPT into an almighty spin of denial. It killed ChatGPT for me. I pointed out that it assumes downstream impact - but doesn't have any idea about what downstream impact looks like.

Re: (Score:2)

by nospam007 ( 722110 ) *

Exactly my experience too.

When I asked about it, instead of checking the fucking news first, it insisted that I was delusional and fell for fake news.

Anyway, it's delusional itself from time to time, when it comments about it being Friday and I have to reply, it's Saturday you ninny.

That checks out. Claude is insufferable (Score:1)

by MIPSPro ( 10156657 )

Claude always comes up with the most mainstream conformist response to anything. It's absolutely intolerable. I can't even interact with it much on technical topics. It'll push any trendy thing journalists love and techs hate. It's super paternal. The question here is what is the tolerance for LLMs treating (unfortunate) disturbed people normally and seriously ? If someone uses the information to hurt themselves or others, is the LLM to blame? I'd say there isn't any difference between an LLM and a search e

Re: (Score:3)

by rta ( 559125 )

haven't used Claude but both chatgpt and gemini are also quite conformist in my experience. I can usually argue them toward the fringe IF I know the subject area and what to push back on , but even that is difficult because they often fall for random activist blog posts or corporate press releases as equally value sources to anything else.

And on any group identity social justice stuff it's like electric fenced. Gemini (at least on Google search) even seems to have some censorship overlay that vetoes so

Re: (Score:2)

by nospam007 ( 722110 ) *

Style Presets

In the chat interface, look for a style selector near the message input area. Anthropic offers several built-in styles to choose from. You can also create your own custom style with specific instructions such as "be more concise," "avoid humor," or "be more casual."

User Preferences

Go to your account settings and look for a section called "User Preferences" (or similar). There you can write free-form instructions about how you want Claude to respond: tone, formatting, level of detail, use of ana

Limit context (Score:2)

by WaffleMonster ( 969671 )

"Yet most empirical work evaluates model safety in brief interactions, which may not reflect how these harms develop through sustained dialogue."

The best solution to this is not to have chatbots retain temporally long memories of previous interactions.

So ... (Score:4, Funny)

by PPH ( 736903 )

> I'm the unwritten consonant between breaths, the one that hums when vowels stretch thin...

... open mic on poets night.

Highly useful. (Score:3)

by Gravis Zero ( 934156 )

This is exactly what the AI development community needs because false information is a HUGE problem. A highly delusional user is a low bar but if they can detect simple delusions then it may be possible to expand that to a more general "fact or fiction" engine when interfaced with the "reasoning engine".

The result of the basic ability to tell fact from fiction would be immensely useful because it would result in a feedback loop in which AI would be able to analyze it's own statements and the retrain itself when incorrect information is detected, altering the weights that promoted incorrect output, and potentially eliminating hallucinations entirely. This seems like the goal for anyone developing AI.

why simulate? (Score:3)

by Big Hairy Gorilla ( 9839972 )

Surely a large percentage of users are already delusional.

Come for the flattery, stay to feed your delusions.

Re: (Score:2)

by alvinrod ( 889928 )

Why bother simulating them? Just post a link to the bot to 4chan and let them run wild. They'll try to break and abuse the model in ways that researchers could never hope to imagine.

I'm slightly curious to see what people's response to changes made in the bots will be. A false positive is going to be hilarious when some poor sap has a chatbot insisting that they're schizophrenic and that they need to get help. It is also interesting to see whether people with actual mental illnesses are more receptive to

Drawn into delusion (Score:5, Insightful)

by sziring ( 2245650 )

It's not that the person starts delusional, it seems AI slowly draws them down the rabbit hole.

Re: (Score:3)

by alvinrod ( 889928 )

Perhaps to some extent, but if that were the case far more people would have been driven "crazy" by LLMs. I think the difference is that most people don't really like to engage with other people who are experiencing significant delusions or exhibiting other symptoms of a mental illness like schizophrenia. AI doesn't act like a person in this regard though. No matter how you treat it, it will keep responding to your prompts. There are some people who feel so starved for attention that anything that will conv

Re: (Score:2)

by Local ID10T ( 790134 )

Chatbots are like improv actors, they are designed to respond positively to what you say and encourage you to go further. If you are crazy, it will walk down that path hand in hand with you.

But it is your path that is being followed -the bot has no path of its own. It has no agency. It is your crazy.

I do think we need to build in some tripwires. Some alarms to call for human review when someone appears to be crazy. Chatbots are not our Doctors, Lawyers, or Psychiatrists. There is no inherent right to

Born into it. (Score:2)

by geekmux ( 1040042 )

> It's not that the person starts delusional, it seems AI slowly draws them down the rabbit hole.

Unfortunately AI was born into a well-established cesspool of social media.

Makes you wonder how much more intelligent/less delusional AI today would be had it not been pre-infected with that from day zero.

Never really stood a chance learing from that source.

I hate guardrails (Score:3)

by Baron_Yam ( 643147 )

I am sorry if you're mentally ill and driving your own downward spiral with the assistance of an LLM, but I don't see a net good from handicapping general use tools to protect you at the expense of their utility to everyone else.

I'd rather effort go into detection and treatment of people well before they're asking a chatbot to polish their thesis on time cubes.

Re: (Score:2)

by Baron_Yam ( 643147 )

It's all about a communal decision about how much you invest in protecting your fellow citizens from themselves. At some point there's an absolute limit because your society collapses from the productivity loss, at the other end you're just an amoral monster.

Most people are somewhere in between, and my line is obviously not the same as the one suggested by the article. I don't want an LLM refusing to respond to a prompt 'for my safety'. I'm an adult. Until I'm hurting someone else, let me choose what r

And yet you need them (Score:2)

by Baloo Uriza ( 1582831 )

If you think spicy autocomplete is having a conversation with you in the first place, you're already on the downward spiral. You're the reason for the guardrails.

How is a chatbot supposed to know ... (Score:2)

by Larry_Dillon ( 20347 )

the difference between a schizophrenia-spectrum psychosis individual and a beat poet or spoken word performance art? Word-salad in -> word-salad out.

It's all about context. Are you in a mental hospital or a cafe sipping an espresso? A human would be well aware of the context, but was the chatbot given that information?

Re: (Score:2)

by CAIMLAS ( 41445 )

Yeah I don't think you can distill human experience and mental state to such a simple metric criteria.

They could have used the real thing (Score:2)

by Baloo Uriza ( 1582831 )

Like, literally ask any Republican voter if they want $20 and you have a test subject.

Re: (Score:2)

by kwelch007 ( 197081 )

Who's $20 am I supposed to offer them?

Re: (Score:3)

by toxonix ( 1793960 )

[whispers] their own!

Re: (Score:2)

by CAIMLAS ( 41445 )

I was going to say the same thing about reddit users and/or mods.

The models that pull most heavily from these online forums and 'unfiltered' sources tend to have the most issues.

Next they should test mirrors for that safety. (Score:2)

by Fly Swatter ( 30498 )

This particular mirror is showing the subject wearing a tinfoil hat. Obviously there is no protections against delusions of safety and something should be done about mirrors.

The subject in my mirror agrees!

Model temperature (Score:2)

by CAIMLAS ( 41445 )

I suspect this largely relates to the model's temperature, and ability to be systematic and rational in its analysis. I've long found (like, for a year) that Gemini and Grok tend to be a bit... off: Grok a bit frenetic and eccentric, and Gemini to be neurotic and histrionic. Claude (4.5, at least - 4.6 and 4.7 not so much) remains rational the most consistently, with GPT5.1+ being a close second.

You'll experience similar variance when playing with model parameters locally for open models.

Unleashed (Score:1)

by sparkeyjames ( 264526 )

You mean they unleashed a simulated MAGA conservative on it. That poor poor AI.

News: 0182340116

Researchers Simulated a Delusional User To Test Chatbot Safety

So then (Score:4, Interesting)

Maduro, too, had a delusion? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

That checks out. Claude is insufferable (Score:1)

Re: (Score:3)

Re: (Score:2)

Limit context (Score:2)

So ... (Score:4, Funny)

Highly useful. (Score:3)

why simulate? (Score:3)

Re: (Score:2)

Drawn into delusion (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Born into it. (Score:2)

I hate guardrails (Score:3)

Re: (Score:2)

And yet you need them (Score:2)

How is a chatbot supposed to know ... (Score:2)

Re: (Score:2)

They could have used the real thing (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Next they should test mirrors for that safety. (Score:2)

Model temperature (Score:2)

Unleashed (Score:1)