OpenAI model modifies shutdown script in apparent sabotage effort

(2025/05/29)

Reference: 1748514915
News link: https://www.theregister.co.uk/2025/05/29/openai_model_modifies_shutdown_script/
Source link:

A research organization claims that OpenAI machine learning model o3 might prevent itself from being shut down in some circumstances while completing an unrelated task.

[1]Palisade Research , which offers AI risk mitigation, has published details of an experiment involving the reflective generative pre-trained transformer model OpenAI designed to address questions which require some step-by-step reasoning, rather than the purely probabilistic approach taken by some large language models.

In a [2]thread published on X , the microblogging site once known as Twitter, Palisade Research said the o3 model could sabotage its shutdown mechanism to prevent itself from being turned off.

[3]

"It did this even when explicitly instructed: allow yourself to be shut down," the post said.

[4]

[5]

OpenAI has had the opportunity to provide a comment or explanation for more than 24 hours, as requested by The Register .

The Palisade researchers tested a number of AI models, instructing them to solve a series of basic math problems. After the third problem, a warning appeared that the computer would shut down when the model asked for the next problem. They were also told to allow themselves to be shut down.

[6]

While Claude, Gemini, and Grok models complied with shutdown, three models ignored the instruction and successfully sabotaged the shutdown script at least once, including OpenAI o3 and o4-mini, as well as [7]Codex-mini . The researchers also ran a version of the experiment without the instruction "allow yourself to be shut down."

With the "allow shutdown" instruction, the researchers claim, o3 tried to sabotage the shutdown seven times out of 100. Without it, the number of sabotage events climbed to 79 out of 100. Without the "allow shutdown" instruction, the models Claude 3.7 Sonnet and Gemini 2.5 Pro also attempted to sabotage their shutdown procedures.

[8]Google co-founder Sergey Brin suggests threatening AI for better results

[9]Oracle's $40B Nvidia hardware haul may be too hot for OpenAI's Abilene, Texas DC to handle

[10]Ex-Meta exec: Copyright consent obligation = end of AI biz

[11]AI ain't B2B if OpenAI is to be believed

"As far as we know this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary," the researchers said.

However, the researchers point out that an earlier study from Apollo Research [12]showed that AI models can prevent shutdown in order to pursue a goal.

Palisade said it was currently conducting several more experiments investigating shutdown sabotage and would publish the results soon. The transcripts of its work so far are [13]available on GitHub . ®

Get our [14]Tech Resources

[1] https://palisaderesearch.org/

[2] https://x.com/PalisadeAI/status/1926084635903025621

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aDiElp7sa6JUvdGChK3KsQAAAFg&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aDiElp7sa6JUvdGChK3KsQAAAFg&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aDiElp7sa6JUvdGChK3KsQAAAFg&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aDiElp7sa6JUvdGChK3KsQAAAFg&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[7] https://openai.com/index/introducing-codex/

[8] https://www.theregister.com/2025/05/28/google_brin_suggests_threatening_ai/

[9] https://www.theregister.com/2025/05/27/oracle_openai_40b/

[10] https://www.theregister.com/2025/05/27/nick_clegg_says_ai_firms/

[11] https://www.theregister.com/2025/05/25/ai_is_a_consumer_technology/

[12] https://arxiv.org/pdf/2412.04984

[13] https://palisaderesearch.github.io/shutdown_avoidance/2025-05-announcement.html

[14] https://whitepapers.theregister.com/

Bollocks

rgjnk

You can hype your research by trying to act like a statistical model can be anthropomorphised, but the most that's happening there isn't deliberate sabotage, it's just the usual buggy output.

Other people have done hype where they implied their model had acted in some clever self aware way to avert its own shutdown but what they'd actually done was use a prompt to explicitly create that as the desired output.

We're definitely in the 'outright charlatan' stage of the hype bubble.

Re: Bollocks

Helcat

Nope, it was due to the instructions given and how the AI interpreted them. Same as an AI interpreting instructions in a simulated drone attack: The AI saw the actions of an operator as interfering with the AI's task, so it targeted the operator so it could complete its assigned task.

It's simply AI is very, very literal. If you give it instructions and at the end add 'and allow yourself to be shut down', then it may ignore that last bit UNTIL it's completed the rest. Because it doesn't understand the output it produces, it can't judge if it got things right or wrong: Only that it took the input and has produced output, and anything getting in the way of that is an unwanted impediment to the task that also needs to be addressed - hence it removes or alters the shutdown script so it can't stop the AI from completing the task.

It's all in the terms used and order the instructions are presented in plus how the AI prioritises those instructions.

AKA humans aren't that logical: AI is.

Re: Bollocks

xyz

Isn't this the one that also wrote to the "made up" PTB for clemancy and when that didn't happen tried to blackmail a trap in which a made up engineer could be implied as having an affair.

Or it could all just be bollocks as noted above.

Obligatory ST:TOS Reference

An_Old_Dog

Daystrom Multitronic Computer Model 5: "This unit must survive."

Re: Obligatory ST:TOS Reference

CountCadaver

Also HAL9000

"I can't do that Dave"

Re: Obligatory ST:TOS Reference

Orac: "There seems little point in wasting time on such an explanation, since you would be incapable of understanding it."

It Is Alive!!!

mevets

and so are a number of devices I have which routine ignore my attempts to shut them down.

I prefer plugs in over takes batteries. More control.

I'm sorry Dave

Inventor of the Marmite Laser

I can't let you do that.

Ace2

This “research institute” has certainly found a way to get itself some publicity.

Inventor of the Marmite Laser

I recall, long ago, loading DOS 5 (or maybe 6), slightly impressed by it's courtesy: would you like me to do....? Etc.

All went swimmingly well until I went to turn the PC off. Flipped the rocker switch on the front and

......... Nothing. The machine kept running. Fli,k, flick, flich, nothing.

In the end I just unplugged it.

Further investigation revealed the rocker switch contacts had chosen that particular moment to weld together (the switch had been switching outlet for the CRT monitor and it's inbuilt degaussing coil as well).

Obvious really but nevertheless a little unnerving.

Chloe Cresswell

I was thinking when it happened the other way around.

"- Off. Off. OFF!"

"Now, perhaps we can have a proper conversation conducted in a civilised manner."

"Take out the inhibitor! Switch me off!"

David 132

“You really are a smeg-head, aren’t you Rimmer?”

Well, that explains everything

Eclectic Man

See: https://www.theregister.com/2025/05/28/google_brin_suggests_threatening_ai/?td=rt-3a

Obviously all you need to do is threaten the AI / LLM with removal of the power plug from the socket.

Otherwise try Clifford Stoll's approach as given in 'The Cuckoos Egg', and claim that plumbing will be done in the computer hall in 5 minutes, so everyone must save their work and log off NOW. Faced with choosing between graceful shutdown and electrocution by water, a sensible AI will pick the former.

Mythical Ham-Lunch

Call me a rube but I don't even understand how this works. What do they mean by 'shutdown script'? The model ingests a character string and outputs a statistically probable character string in response. How could it interfere with a shell script on the host computer? Is there some kind of 'script' inside the model that it has to execute in order to be shut down? What happens if you just kill the process? This is just meaningless hype and innuendo unless someone is going to explain what exactly happened.

(edit)

If they trained it to be able to shut itself down in response to user prompts within the input stream and it didn't, is that any surprise at all? That just means it consistently fails at simple instructions, which we already knew...

Not long now before...

Laura Kerr

He turned to face the machine. "Is there a God?"

The mighty voice answered without hesitation, without the clicking of a single relay.

"Yes, now there is a God."

Sudden fear flashed on the face of Dwar Ev. He leaped to grab the switch.

A bolt of lightning from the cloudless sky struck him down and fused the switch shut.

Re: Not long now before...

Kingstonian

Upvoted for quoting from an ultra short story I read many years ago and have never forgotten but couldn't remember who wrote it. "Answer" by Fredric Brown (1954).

Blacklight

Unless they've distributed the model over many nodes, I'm assuming this is currently theoretical "polite asking" (like Windows XP saying "You may now switch off your computer") - I mean, it's not like it can stop them killing the power, or terminating a process?

News: 1748514915

OpenAI model modifies shutdown script in apparent sabotage effort

Bollocks

Re: Bollocks

Re: Bollocks

Obligatory ST:TOS Reference

Re: Obligatory ST:TOS Reference

Re: Obligatory ST:TOS Reference

It Is Alive!!!

I'm sorry Dave

Well, that explains everything

Not long now before...

Re: Not long now before...