GPT-5.5 Matches Heavily Hyped Mythos Preview In New Cybersecurity Tests (arstechnica.com)
(Saturday May 02, 2026 @11:34AM (BeauHD)
from the would-you-look-at-that dept.)
An anonymous reader quotes a report from Ars Technica:
> Last month, Anthropic made a big deal about the supposedly outsize cybersecurity threat represented by its Mythos Preview model, leading the company to [1]restrict the initial release to "critical industry partners." But [2]new research from the UK's AI Security Institute (AISI) suggests that OpenAI's GPT-5.5, which [3]launched publicly last week, [4]reached "a similar level of performance on our cyber evaluations" as Mythos Preview , which the group evaluated last month.
>
> Since 2023, the AISI has run a variety of frontier AI models through 95 different [5]Capture the Flag challenges designed to test capabilities on cybersecurity tasks, such as reverse engineering, web exploitation, and cryptography. On the highest-level "Expert" tasks, GPT-5.5 passed an average of 71.4 percent, slightly higher than the 68.6 percent achieved by Mythos Preview (though within the margin of error). In one particularly difficult task that involved building a disassembler to decode a Rust binary, AISI notes that "GPT-5.5 solved the challenge in 10 minutes and 22 seconds with no human assistance at a cost of $1.73" in API calls.
>
> GPT-5.5 also matched Mythos Preview in its progress on " [6]The Last Ones " (TLO), an AISI test range set up to simulate a 32-step data extraction attack on a corporate network. GPT-5.5 succeeded in 3 of 10 attempts on TLO, compared to 2 of 10 for Mythos Preview -- no previous model had ever succeeded at the test even once. But GPT-5.5 still fails at AISI's more difficult "Cooling Tower" simulation of an attempted disruption of the control software for a power plant, as every previously tested AI model also has. The new results for GPT-5.5 suggest that, when it comes to cybersecurity risk, Mythos Preview was likely not "a breakthrough specific to one model" but rather "a byproduct of more general improvements in long-horizon autonomy, reasoning, and coding," AISI writes.
[1] https://it.slashdot.org/story/26/04/07/2115208/anthropic-unveils-claude-mythos-powerful-ai-with-major-cyber-implications
[2] https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities
[3] https://slashdot.org/story/26/04/23/1931220/openai-says-its-new-gpt-55-model-is-more-efficient-and-better-at-coding
[4] https://arstechnica.com/ai/2026/05/amid-mythos-hyped-cybersecurity-prowess-researchers-find-gpt-5-5-is-just-as-good/
[5] https://www.eccouncil.org/cybersecurity-exchange/ethical-hacking/capture-the-flag-ctf-cybersecurity/
[6] https://arxiv.org/abs/2603.11214
> Last month, Anthropic made a big deal about the supposedly outsize cybersecurity threat represented by its Mythos Preview model, leading the company to [1]restrict the initial release to "critical industry partners." But [2]new research from the UK's AI Security Institute (AISI) suggests that OpenAI's GPT-5.5, which [3]launched publicly last week, [4]reached "a similar level of performance on our cyber evaluations" as Mythos Preview , which the group evaluated last month.
>
> Since 2023, the AISI has run a variety of frontier AI models through 95 different [5]Capture the Flag challenges designed to test capabilities on cybersecurity tasks, such as reverse engineering, web exploitation, and cryptography. On the highest-level "Expert" tasks, GPT-5.5 passed an average of 71.4 percent, slightly higher than the 68.6 percent achieved by Mythos Preview (though within the margin of error). In one particularly difficult task that involved building a disassembler to decode a Rust binary, AISI notes that "GPT-5.5 solved the challenge in 10 minutes and 22 seconds with no human assistance at a cost of $1.73" in API calls.
>
> GPT-5.5 also matched Mythos Preview in its progress on " [6]The Last Ones " (TLO), an AISI test range set up to simulate a 32-step data extraction attack on a corporate network. GPT-5.5 succeeded in 3 of 10 attempts on TLO, compared to 2 of 10 for Mythos Preview -- no previous model had ever succeeded at the test even once. But GPT-5.5 still fails at AISI's more difficult "Cooling Tower" simulation of an attempted disruption of the control software for a power plant, as every previously tested AI model also has. The new results for GPT-5.5 suggest that, when it comes to cybersecurity risk, Mythos Preview was likely not "a breakthrough specific to one model" but rather "a byproduct of more general improvements in long-horizon autonomy, reasoning, and coding," AISI writes.
[1] https://it.slashdot.org/story/26/04/07/2115208/anthropic-unveils-claude-mythos-powerful-ai-with-major-cyber-implications
[2] https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities
[3] https://slashdot.org/story/26/04/23/1931220/openai-says-its-new-gpt-55-model-is-more-efficient-and-better-at-coding
[4] https://arstechnica.com/ai/2026/05/amid-mythos-hyped-cybersecurity-prowess-researchers-find-gpt-5-5-is-just-as-good/
[5] https://www.eccouncil.org/cybersecurity-exchange/ethical-hacking/capture-the-flag-ctf-cybersecurity/
[6] https://arxiv.org/abs/2603.11214