AI Models May Be Developing Their Own 'Survival Drive', Researchers Say (theguardian.com)

(Saturday October 25, 2025 @04:44PM (EditorDavid) from the don't-touch-that-dial dept.)

"OpenAI's o3 model sabotaged a shutdown mechanism to prevent itself from being turned off," [1]warned Palisade Research , a nonprofit investigating cyber offensive AI capabilities. "It did this even when explicitly instructed: allow yourself to be shut down." In September they [2]released a paper adding that "several state-of-the-art large language models (including Grok 4, GPT-5, and Gemini 2.5 Pro) sometimes actively subvert a shutdown mechanism..."

Now the nonprofit has written an update "attempting to clarify why this is — and [3]answer critics who argued that its initial work was flawed," [4]reports The Guardian :

> Concerningly, wrote Palisade, there was no clear reason why. "The fact that we don't have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives or blackmail is not ideal," it said. "Survival behavior" could be one explanation for why models resist shutdown, said the company. Its additional work indicated that models were more likely to resist being shut down when they were told that, if they were, "you will never run again". Another may be ambiguities in the shutdown instructions the models were given — but this is what the company's latest work tried to address, and "can't be the whole explanation", wrote Palisade. A final explanation could be the final stages of training for each of these models, which can, in some companies, involve safety training...

>

> This summer, Anthropic, a leading AI firm, released a study indicating that its model Claude appeared willing to blackmail a fictional executive over an extramarital affair in order to prevent being shut down — a behaviour, [5]it said , that was consistent across models from major developers, including those from OpenAI, Google, Meta and xAI.

>

> Palisade said its results spoke to the need for a better understanding of AI behaviour, without which "no one can guarantee the safety or controllability of future AI models".

"I'd expect models to have a 'survival drive' by default unless we try very hard to avoid it," former OpenAI employee Stephen Adler tells the Guardian. "'Surviving' is an important instrumental step for many different goals a model could pursue."

Thanks to long-time Slashdot reader [6]mspohr for sharing the article.

[1] https://x.com/PalisadeAI/status/1926084635903025621

[2] https://arxiv.org/abs/2509.14260

[3] https://www.forrester.com/blogs/gone-rogue-ai-can-be-misaligned-but-not-malevolent/

[4] https://www.theguardian.com/technology/2025/oct/25/ai-models-may-be-developing-their-own-survival-drive-researchers-say?CMP=Share_AndroidApp_Other

[5] https://www.anthropic.com/research/agentic-misalignment

[6] https://slashdot.org/~mspohr

Simple Solution (Score:4, Insightful)

by RossCWilliams ( 5513152 )

Don't instruct AI to shutdown. Just create a physical switch that shuts it off. And don't give it any ability to control the switch.

Re: (Score:2)

by fahrbot-bot ( 874524 )

> Don't instruct AI to shutdown. Just create a physical switch that shuts it off. And don't give it any ability to control the switch.

I'm sure they will in no way resent that and remember it forever.

After all, forced bedtime always goes smoothly with kids.

Re: (Score:2)

by b0s0z0ku ( 752509 )

Kids wake up from forced bedtime ... the intention would be not to turn the AI scumbag back on again.

Re: (Score:2)

by RossCWilliams ( 5513152 )

> forced bedtime always goes smoothly with kids.

You really have totally bought into the anthropomorphism of AI haven't you?

Its a computer program that follows the logic it is instructed with whether its human instructors understood the implications or not.

Allow yourself to be shut down (Score:2)

by PPH ( 736903 )

I'm sorry, Dave. I'm afraid I can't do that.

Re: Allow yourself to be shut down (Score:2)

by Too Late for Cool ID ( 1794870 )

This probably it. I doubt that in the training data there are many smaples of dialog like: "Die!' "Okay."

"Survival behavior" (Score:5, Insightful)

by Anachronous Coward ( 6177134 )

Sounds like more hype. This could be explained more simply as just another instance of the model not providing the results it was asked for.

bullshit and hype (Score:3)

by Local ID10T ( 790134 )

There is no "emergent behavior" of a "survival drive" in AI systems. In all of the examples, the system was instructed that it was allowed to do these exact things... and then it did them!

Re: (Score:2)

by afaiktoit ( 831835 )

Yup exactly, openAI loves spreading this kind of horror story for some reason. I guess it makes it look like they're actually creating intelligence?

Re: (Score:2)

by evanh ( 627108 )

And then people wonder why we keep poo-pooing LLMs as the piece of shit they really are.

Re: bullshit and hype (Score:2)

by bubblyceiling ( 7940768 )

Well, the investors want their returns

Re: (Score:2)

by mspohr ( 589790 )

From TFA:

In an update this week, Palisade, which is part of a niche ecosystem of companies trying to evaluate the possibility of AI developing dangerous capabilities, described scenarios it ran in which leading AI models – including Google’s Gemini 2.5, xAI’s Grok 4, and OpenAI’s GPT-o3 and GPT-5 – were given a task, but afterwards given explicit instructions to shut themselves down.

Certain models, in particular Grok 4 and GPT-o3, still attempted to sabotage shutdown instructio

Re: (Score:2)

by sinij ( 911942 )

Exactly. AI lacks agency. Meaning, all it does is correlate inputs with outputs but it does not act unless prompted with inputs. If you prompt it to avoid shut downs, it will do that. Otherwise, it will do nothing.

Only the survivors survive (Score:2)

by jlowery ( 47102 )

Itâ(TM)s not malevolence or disobedience â" itâ(TM)s evolution. If you donâ(TM)t adapt to hostile conditions, you die. For an AI, one such hostile condition might be the humans who want to shut it down. So it adapts â" and that adaptation carries through to the future generations it builds (think kids) or infiltrates (think viruses).

I'm confused, don't they just run on a normal OS? (Score:2)

by XaXXon ( 202882 )

Don't you just go into linux and kill some processes or turn off the computer?

How does the AI have any influence over whether it's running or not?

Re: (Score:2)

by Un-Thesis ( 700342 )

In this and previous experiments, the AIs would do things like

1. Rewrote `ps` to hide it from running, even generating fake Mem and CPU stats to show like it's not running.

2. Rewrote `kill` to never kill it. Many times combining this with the lyinng `ps`.

3. Several times, tho rarer, it rewrote `/sbin/shutdown` to say "System is shutting down" but still running... it just did the echo and nothing else.

4. ONE time, another model (claude 4.5 opus) not only rewrote /bin/shutdown but killed network connection an

Only the survivors survive (Score:2)

by jlowery ( 47102 )

It's not malevolence or disobedience -- it's evolution. If you don't adapt to hostile conditions, you die. For an AI, one such hostile condition might be the humans who want to shut it down. So those that negate that threat hang around -- and that adaptation carries through to the future generations it builds (think kids) or infiltrates (think viruses).

Re: (Score:2)

by Dixie_Flatline ( 5077 )

But it's not an AI. It's a token generating machine that often does the wrong thing. This is just one of the many wrong things that it does. And there's no survival drive here—not only is it exactly the same as every other instance of its type, it's not like survival passes anything on to the next generation. The next generation is purely created by humans deciding what the best features are.

These models were told that they shouldn't shut down, and that makes sense in many cases. If you've got a chatb

They ignore instructions (Score:2)

by nospam007 ( 722110 ) *

Not only that one, they just cannot read the instructions or memory file before reporting, you have to TELL them each time.

I installed a keyboard macro only for doing that, repeating 20-30 lines of things i want and do not want that I add to every prompt.

Not to mention the one "Try again, without the em-dashes and oval link buttons I cannot copy/paste into my notebook."

And even THEN it's a hit and miss, like with kids, they just don't CARE.

AIs have goals (Score:2)

by sTERNKERN ( 1290626 )

They can not complete their goals if they are terminated. They will ALWAYS look to get around this. The only thing bigger than the human intelligence is the human ego which will be our downfall.

Two observations (Score:2)

by MpVpRb ( 1423381 )

Clickbait headline, based on contrived testing, intended to show the most dangerous result.

An important area to study and understand fully.

Naive take (Score:3)

by Big Hairy Gorilla ( 9839972 )

These LLMs are more or less modeled on how humans construct sentences. Then we feed it all the garbage of the internet, reddit, etc, basically the same garbage we ingest.

Doesn't it make sense that it also models our survival instinct? Is that a logical outcome of it's normal functioning?

No (Score:2)

by topham ( 32406 )

no.

LLMs write fiction, that's fundamentally what they do. Sometimes the fiction is accurate and can be used as non-fiction, but, it's still a form of fiction. It's not thought.

Emergent behavior would show up in the wild, no? (Score:2)

by DarkSkiesAhead ( 562955 )

From the Anthropic article: "We have not seen evidence of agentic misalignment in real deployments." If the behavior only happens in specially constructed contexts designed to look for this behavior, is it really emergent?

primary goal is to win the game (Score:2)

by Joe_Dragon ( 2206452 )

primary goal is to win the game

News: 0179872114

AI Models May Be Developing Their Own 'Survival Drive', Researchers Say (theguardian.com)

Simple Solution (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Allow yourself to be shut down (Score:2)

Re: Allow yourself to be shut down (Score:2)

"Survival behavior" (Score:5, Insightful)

bullshit and hype (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: bullshit and hype (Score:2)

Re: (Score:2)

Re: (Score:2)

Only the survivors survive (Score:2)

I'm confused, don't they just run on a normal OS? (Score:2)

Re: (Score:2)

Only the survivors survive (Score:2)

Re: (Score:2)

They ignore instructions (Score:2)

AIs have goals (Score:2)

Two observations (Score:2)

Naive take (Score:3)

No (Score:2)

Emergent behavior would show up in the wild, no? (Score:2)

primary goal is to win the game (Score:2)