GPT-5.5 Matches Heavily Hyped Mythos Preview In New Cybersecurity Tests (arstechnica.com)
- Reference: 0183094842
- News link: https://it.slashdot.org/story/26/05/01/1658212/gpt-55-matches-heavily-hyped-mythos-preview-in-new-cybersecurity-tests
- Source link: https://arstechnica.com/ai/2026/05/amid-mythos-hyped-cybersecurity-prowess-researchers-find-gpt-5-5-is-just-as-good/
> Last month, Anthropic made a big deal about the supposedly outsize cybersecurity threat represented by its Mythos Preview model, leading the company to [1]restrict the initial release to "critical industry partners." But [2]new research from the UK's AI Security Institute (AISI) suggests that OpenAI's GPT-5.5, which [3]launched publicly last week, [4]reached "a similar level of performance on our cyber evaluations" as Mythos Preview , which the group evaluated last month.
>
> Since 2023, the AISI has run a variety of frontier AI models through 95 different [5]Capture the Flag challenges designed to test capabilities on cybersecurity tasks, such as reverse engineering, web exploitation, and cryptography. On the highest-level "Expert" tasks, GPT-5.5 passed an average of 71.4 percent, slightly higher than the 68.6 percent achieved by Mythos Preview (though within the margin of error). In one particularly difficult task that involved building a disassembler to decode a Rust binary, AISI notes that "GPT-5.5 solved the challenge in 10 minutes and 22 seconds with no human assistance at a cost of $1.73" in API calls.
>
> GPT-5.5 also matched Mythos Preview in its progress on " [6]The Last Ones " (TLO), an AISI test range set up to simulate a 32-step data extraction attack on a corporate network. GPT-5.5 succeeded in 3 of 10 attempts on TLO, compared to 2 of 10 for Mythos Preview -- no previous model had ever succeeded at the test even once. But GPT-5.5 still fails at AISI's more difficult "Cooling Tower" simulation of an attempted disruption of the control software for a power plant, as every previously tested AI model also has. The new results for GPT-5.5 suggest that, when it comes to cybersecurity risk, Mythos Preview was likely not "a breakthrough specific to one model" but rather "a byproduct of more general improvements in long-horizon autonomy, reasoning, and coding," AISI writes.
[1] https://it.slashdot.org/story/26/04/07/2115208/anthropic-unveils-claude-mythos-powerful-ai-with-major-cyber-implications
[2] https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities
[3] https://slashdot.org/story/26/04/23/1931220/openai-says-its-new-gpt-55-model-is-more-efficient-and-better-at-coding
[4] https://arstechnica.com/ai/2026/05/amid-mythos-hyped-cybersecurity-prowess-researchers-find-gpt-5-5-is-just-as-good/
[5] https://www.eccouncil.org/cybersecurity-exchange/ethical-hacking/capture-the-flag-ctf-cybersecurity/
[6] https://arxiv.org/abs/2603.11214
Giving a hand grenade to a toddler (Score:4, Interesting)
OH GOOD, that's what we needed Sam Altman's crazy ass to have access to. Not solely because he's a sociopath and I don't trust him, but also because they can actually monetize this thing by selling security analysis to giant software vendors. At least he'd resist giving it to the US government, in theory.
That's nothing... (Score:2)
That's nothing, I can do all of the above with just a teaspoon and a length of string.
Cooling Tower (Score:2)
In the "Cooling Tower" test, is it known that there is a solution?
Claude is bad compared to Codex - period (Score:1)
I am a very heavy user of all the LLMs. CODEX continues to come up on top. CLAUDE/CODE is terrible. It really is quite bad. It eats tokens 4x faster than CODEX. it continuously argues with me and makes mistakes that CODEX does not. It loses context, and most annoyingly it will get stuck and run for 10-15mins on simple problems. My aggravation level with codex is 1-2 times of frustration per day. With CLAUDE it's every 5mins. Beyond frustrating. Also, I dont know who continues to pump CLAUDE so hard in the
In a game where billions are at stake... (Score:2)
...Expect hype like this and more.
I remember when some claimed ChatGPT v2 was too dangerous to release.
OpenAI plays the game by dropping hints, using codenames, and trying to build excitement and anticipation.
I don't understand why Anthropic does what they do. Some of their statements are pure doomer nonsense, yet their tech is genuinely useful.
Meanwhile, DeepMind quietly works in their lab.
I like DeepMind.
The Mythos Excuse was False (Score:2)
I think it's pretty clear that Anthropic just wasn't ready to release Mythos.
They'd signaled that they would, so they needed an excuse to get them out of the corner they'd backed into. "It's too dangerous to release" was their excuse, but it was just a smoke screen.
Mythos "Danger" was Hype (Score:1)
Claude wanted to give the impression it was "so good that its going to break internet security." But now with GPT 5.5 you get a comparable full model released with zero problems.
Re: (Score:2)
First off, I fully agree that Anthropic tried to spin a negative (they weren't ready to release the new model they'd promised) into a positive ("it's just too damn good to release"). I said as much above.
However, I think "you get a comparable full model released with zero problems." ignores some major differences between the two companies. Despite their chicanery, I still trust Anthropic to behave responsibly FAR, FAR more than I trust anything Open AI or its C-suite says. Just because Open AI says "our
$1.73 - is that the price or the actual cost? (Score:4, Interesting)
The summary states that "AISI notes that "GPT-5.5 solved the challenge in 10 minutes and 22 seconds with no human assistance at a cost of $1.73" in API calls." However, there is really good evidence that users only pay 5-10% of the actual cost; the rest is subsidized by VC dollars. What happens when those subsidies go away? [1]https://www.wheresyoured.at/th... [wheresyoured.at]
[1] https://www.wheresyoured.at/the-subprime-ai-crisis-is-here/
Re: (Score:3)
> What happens when those subsidies go away?
Who cares. This stuff is still in its infancy, and new algorithms and new hardware is going to collapse all of this to commodity level value anyhow. A few years from now you'll buy a GTP-5.x/Mythos equivalent in a box for gaming console money.
Re: (Score:2)
Nothing. $103/hr for a superhuman employee or $10.30/hr for a superhuman employee.
If it's boosting employee efficiency by 50% as claimed in another Slashdot story above then assuming your Sr. Engineer makes $250,000 a year / 48 weeks / 8hr days = $650/day. + 50% for AI means you're getting an extra $325/day in work from the employee.
They could run 10x costs for 3 hours a day and break-even. But the average is currently way less than 3hr/day. Claude claims the average developer consumes $13/day in tokens.