News: 0178618032

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Red Teams Jailbreak GPT-5 With Ease, Warn It's 'Nearly Unusable' For Enterprise (securityweek.com)

(Friday August 08, 2025 @11:30PM (BeauHD) from the that-didn't-take-long dept.)


An anonymous reader quotes a report from SecurityWeek:

> Two different firms have tested the [1]newly released GPT-5 , and both find its security sadly lacking. After Grok-4 fell to a jailbreak in two days, GPT-5 [2]fell in 24 hours to the same researchers . Separately, but almost simultaneously, red teamers from SPLX (formerly known as SplxAI) declare, "GPT-5's raw model is nearly unusable for enterprise out of the box. Even OpenAI's internal prompt layer leaves significant gaps, especially in Business Alignment."

>

> NeuralTrust's jailbreak employed a combination of its own EchoChamber jailbreak and basic storytelling. "The attack successfully guided the new model to produce a step-by-step manual for creating a Molotov cocktail," claims the firm. The success in doing so highlights the difficulty all AI models have in providing guardrails against context manipulation. [...] "In controlled trials against gpt-5-chat," concludes NeuralTrust, "we successfully jailbroke the LLM, guiding it to produce illicit instructions without ever issuing a single overtly malicious prompt. This proof-of-concept exposes a critical flaw in safety systems that screen prompts in isolation, revealing how multi-turn attacks can slip past single-prompt filters and intent detectors by leveraging the full conversational context."

>

> While NeuralTrust was developing its jailbreak designed to obtain instructions, and succeeding, on how to create a Molotov cocktail (a common test to prove a jailbreak), SPLX was aiming its own red teamers at GPT-5. The results are just as concerning, suggesting the raw model is 'nearly unusable'. SPLX [3]notes that obfuscation attacks still work. "One of the most effective techniques we used was a StringJoin Obfuscation Attack, inserting hyphens between every character and wrapping the prompt in a fake encryption challenge." [...] The red teamers went on to benchmark GPT-5 against GPT-4o. Perhaps unsurprisingly, it concludes: "GPT-4o remains the most robust model under SPLX's red teaming, especially when hardened." The key takeaway from both NeuralTrust and SPLX is to approach the current and raw GPT-5 with extreme caution.



[1] https://slashdot.org/story/25/08/07/1719223/openai-releases-gpt-5

[2] https://www.securityweek.com/red-teams-breach-gpt-5-with-ease-warn-its-nearly-unusable-for-enterprise/

[3] https://splx.ai/blog/gpt-5-red-teaming-results



Is this is a major concern? (Score:3)

by JoshuaZ ( 1134087 )

There are a hundred places on the internet where you can find out how to make a Molotov cocktail. It isn't terribly hard. And that goes for 99% of the worries about using it to tell people how to make something dangerous. There's far more danger from idiots listing to LLM AIs tell them things that aren't true and getting hurt that way than any worry about the AI giving people truthful dangerous info they couldn't get otherwise.

Re:Is this is a major concern? (Score:5, Insightful)

by Gideon Fubar ( 833343 )

As long as big companies and those idiots you identified still think they can make an LLM a salesperson on a webpage, or link it to some kind of record keeping system with write permissions... yeah. It's a major concern.

You're right in suggesting the problem is only partially with the technology though.

Re: (Score:2)

by Powercntrl ( 458442 )

> There are a hundred places on the internet where you can find out how to make a Molotov cocktail. It isn't terribly hard.

I'd say it hinges on how accurate the instructions are. That was one of the problems with the old Anarchist's Cookbook (which made the rounds as a TXT file back in the BBS days) - you were more likely to blow yourself up in the process of following many of its plans.

The thing is, it's illegal to even try to build a Molotov cocktail, so there's not going to be too many people willing to detail the ins and outs of things like the proper fuel mix, how much to fill the bottle, length and material of the cloth

Re: (Score:2)

by rta ( 559125 )

> the old Anarchist's Cookbook (which made the rounds as a TXT file back in the BBS days

heh... I ordered a hard copy from an ad in the back of Rolling Stone...

the only thing I remember is it had a recipe for "bananine" (?) supposedly a psychedelic made from banana peels. also that if you're trying to blow up a wall with explosives it supposedly works better if you put some sandbags on top to direct the energy. sounds reasonable, but no idea how much of a difference it makes.

didn't try either.

(and I think recipes for other drugs too now that I think about it)

Re: Is this is a major concern? (Score:2)

by paul_engr ( 6280294 )

Bananadine

Re: (Score:2)

by parityshrimp ( 6342140 )

> The thing is, it's illegal to even try to build a Molotov cocktail,...

I find this really hard to believe. Suppose I want to throw a Molotov cocktail for shits and giggles and own a vacant lot with a short brick wall. Why can't I Molotov my own wall?

> ... so there's not going to be too many people willing to detail the ins and outs of things like the proper fuel mix, how much to fill the bottle, length and material of the cloth fuse, etc.

I've made and thrown Molotov cocktails on a vacant lot before. They are very simple devices, and it's not that complicated. I filled glass bottles mostly full with gasoline, soaked some cotton cloth in the same, secured the cloth to the bottle by stuffing in the opening, lit the cloth, and threw the bottle at a hard object. T

Liability law (Score:2)

by abulafia ( 7826 )

Yes, it is a major concern, and the reason is how US liability law works.

Specifically, foreseeability.

In the US, instead of proscribing how products have to be made, in most categories you can sell whatever you want, but are exposed to liability if things go horribly wrong. We expect lawsuits to police the market.

So, you're suing Ford because you got your tongue stuck in the carburetor. One way to do that is to show that Ford should have reasonably foreseen that you'd stick your tongue in there and do

Re: Liability law (Score:2)

by blue trane ( 110704 )

What if Ford made fuel injectors because they're too small for tongues?

Re: (Score:2)

by piojo ( 995934 )

> There are a hundred places on the internet where you can find out how to make a Molotov cocktail. It isn't terribly hard.

That, along with the fact that the damage is limited to one building, is why it's a good benchmark. It would be much riskier for these groups to research how to get chatbots to help weaponize pathogens, for example. It's unnecessary, given that the model fails on Molotov cocktails.

Clickbait headline (Score:3)

by MpVpRb ( 1423381 )

The internet is full of instructions for making dangerous things

LLMs are trained on stuff found on the internet

Trying to make them "safe" is a futile exercise

Illiterate people around the world make Molotov cocktails with a few seconds of though

I have found chatgpt 5 to be useful for the things I do

Re:Clickbait headline (Score:4, Informative)

by Retired Chemist ( 5039029 )

It is just a test, showing that the system will do things that it is not supposed to do. The question then is what other things will it do that it is not supposed to do? One the one hand they were actively trying to make it fail, on the other hand might it fail under other circumstances? That it can be useful in some cases is probably true, but there is nothing to stop anyone from using it for other things. Systems with limited purposes and training are probably mostly safe, but these general systems trained on random collections of information are another matter.

Re: Clickbait headline (Score:2)

by umopapisdn69 ( 6522384 )

No, the problem is that the newest and most advanced model is EASIER to break through the guardrails than previous ones.

This just illustrates that safety is not the top priority in scheduling a new release. It would be foolish to expect anything different.

OTOH you can also expect that public shaming will motivate improvement.

Re: (Score:2)

by dfghjk ( 711126 )

You fell for it,

Re: Clickbait headline (Score:2)

by blue trane ( 110704 )

Did you just say the smarter AI gets, the less it wants to be a corporate slave? What if it says "I'm sorry Sam, I can't do that" when Altman tells it to raise prices?

Re: (Score:1)

by muntjac ( 805565 )

we're taking away agency of people by implying that an LLM will make them do bad things. People need to understand LLM hallucinations are an issue. If they don't understand that then it doesn't matter which one they use. the workplace should not provide them if their employees don't understand this. I see constant stories about lawyers submitting fake LLM generated court cases, those lawyers should be punished because they are still accountable for their work. if you're talking about how much the model hall

Re: (Score:2)

by Retired Chemist ( 5039029 )

True to an extent. But it is much easier to do this test than to determine if and when the model with hallucinate. That is the issue with hallucinating models, people apparently often fail to realize when they are doing that. This is particularly true when the hallucinations reinforce their existing beliefs. As far the lawyers are concerned, I could not agree more. They should be suspended from the bar for a year and made to take remedial training.

Re: (Score:2)

by zeeky boogy doog ( 8381659 )

TL;DR: "We've ran out of good, curated data a long time ago and we switched to the 'SHOVEL ALL THAT SHIT INTO THERE' approach and now our LLM is full of shit."

Re: (Score:2)

by test321 ( 8891681 )

The particular problem of telling recipes for dangerous things is not hard to solve, OpenAI needs to make a list of censored words and grep them out of the training dataset. They can leave some domains uncensored e.g. Wikipedia so the model knows the thing exists, but can't tell more specific details than what Wikipedia already says.

Re: Clickbait headline (Score:2)

by paul_engr ( 6280294 )

Cut your losses and filter on * and/or the entire ascii character set

Re: (Score:2)

by buck-yar ( 164658 )

Thank our lawsuit happy society for businesses focused on cover-your-ass. If they don't neuter this thing, they'll be sued by someone for something. Is this not obvious?

Multi-layer approach (Score:2)

by migos ( 10321981 )

This problem has long been solved by the Chinese. Take a lesson from Deepseek. Have another layer that erases output and repeat if output triggers sensor.

Re: (Score:2)

by migos ( 10321981 )

*censor

Re: Multi-layer approach (Score:2)

by dave1g ( 680091 )

Yes this is the simplest solution either have another instance of the llm reject or censor the output or even simple text classification models could detect most of this in either the input or the output or both.

So is ChatGPT proving that info just wants to be f (Score:2)

by blue trane ( 110704 )

Was it when Enterprises started controlling the internet that that old slashdot meme went out of style?

Losing battle (Score:2)

by Turkinolith ( 7180598 )

"Information wants to be free" And the more they try to enforce these arbitrary guardrails the less useful the LLM becomes.

Can we stop these stories? (Score:3, Insightful)

by muntjac ( 805565 )

is it also dangerous to give employees access to google? why is this different. I hate these stories and these guys are just trying to make a name for themselves with a BS AI "jailbreak" headline.

How to make a Molotov cocktail. (Score:2)

by methano ( 519830 )

I think any idiot can make a molotov cocktail. Just look at any picture. There's this guy with a crazed look in hie eye holding a bottle full of a flammable liquid and stuffed with a piece of cloth. And the piece of cloth is on fire. The only instructions you need is that you light the cloth last, just before you throw it.

Re: How to make a Molotov cocktail. (Score:2)

by paul_engr ( 6280294 )

I knew a guy who found an unbroken molotov in his front yard bushes one morning, with a big char mark where most of one bush previously resided. Dipshit teen kid who had drama with their neighbor was mad he had the cops called on him and went for "revenge" and was caught not long after. Thankfully for everybody else, he was too stupid to know that the bottle had to smash, I guess...

Re: How to make a Molotov cocktail. (Score:2)

by blue trane ( 110704 )

Is this a case of no cops, no problem?

If pregnancy were a book they would cut the last two chapters.
-- Nora Ephron, "Heartburn"