News: 1754606503

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

OpenAI's GPT-5 is here with up to 80% fewer hallucinations

(2025/08/08)


OpenAI unveiled its most capable model yet on Thursday with the launch of GPT-5.

AI hype man and OpenAI CEO Sam Altman described it as like talking to your own personal expert that can write applications on demand. "We think this idea of software on demand is going to be one of the defining characteristics of the GPT-5 era," he said, kicking off an over-75-minute presentation packed with code demos.

Compared to earlier models, OpenAI says GPT-5 delivers improvements in coding, writing, math, and visual perception, while also cutting down on hallucinations and deceptive behavior.

[1]

[2]Youtube Video

[3]

[4]

To be clear, GPT-5 isn't one model. It's actually a collection of models to which OpenAI will route prompts based on signals like the user's intent or the request's general complexity.

According to OpenAI, simple prompts might be routed to a small, efficient version of the model that can respond quickly without "thinking", while a larger, deeper reasoning model might be used to handle more complex or nuanced tasks. This capability is triggered automatically based on user prompts. Paid users will also have the option of toggling on reasoning functionality permanently if desired.

[5]

This routing model is apparently being continuously trained on new input signals to make it smarter about which model it routes the request to and when to trigger reasoning functionality. However, OpenAI says it eventually plans to integrate them all into a single model.

In addition to being faster, OpenAI says this architecture is more efficient than prior designs.

"GPT-5 gets more value out of less thinking time. In our evaluations, GPT-5 — with thinking — performs better than OpenAI o3 with 50-80 percent less output tokens across capabilities, including visual reasoning, agentic coding, and graduate-level scientific problem solving," the company wrote in a [6]blog post .

[7]

ChatGPT Free and Plus users will have access to GPT-5 and GPT-5 mini, while Pro and Enterprise users will have access to a Pro variant, which can reason for longer. Those accessing the models via API will also have access to a Nano version at a reduced cost, alongside the standard and mini models.

Revolutionary upgrade or overhyped iteration

While OpenAI's presentation was packed with hyperbolic claims and demos about GPT-5 being its smartest model ever, the company's benchmark results told a slightly different story, one of mostly iterative improvements.

[8]

Your eyes aren't deceiving you. GPT-5 shows only iterative improvements in math benchmarks like AIME 2025 - Click to enlarge

In the AIME 2025 math bench, GPT-5 Pro eked out a 1.6 point lead over the company's previous flagship o3 model when using tools and a 7.8 point advantage without them. With that said, for free tier users, the new models are a pretty big upgrade over GPT4o, with GPT 5 (non-Pro) managing a 57.5 point advantage. And it was a similar story with the FrontierMath and the HMMT math benches.

[9]

GPT-5 showed similarly narrow gains over o3 in the GPQA Diamond bench as well - Click to enlarge

Similarly, iterative performance gains were observed in GPQA Diamond, a PhD-level science quiz, and Humanity's Last Exam. Across nearly every benchmark suite, GPT-5 managed single-digit leads over last gen's models.

[10]

Compared to o3, GPT-5 is way more adept at tool use and instruction following - Click to enlarge

One of the most obvious standouts was in Tau2-bench, a conversation agent benchmark where GPT-5's improvements in tool calling and instruction following were on full display.

"Benchmarks, they're exciting numbers, but we're starting to saturate them, like when you're moving between 98% and 99% in some benchmark it means you need something else to really capture how great the model is," OpenAI president Greg Brockman admitted.

This is no doubt why so much of the presentation was dedicated to demos and testimonials. Speaking of which, one capability Altman was particularly excited about was GPT-5's performance in health-related queries.

"One of the top use cases of ChatGPT is health. People use it a lot. You've all seen examples of people getting day-to-day care advice or sometimes even a life saving diagnosis," Altman said. "GPT-5 is the best model ever for health. It empowers you to be more in control of your healthcare journey."

Apparently, ChatGPT has usurped WebMD for self-diagnosis.

During one testimonial, the company appeared to be suggesting users struggling to make sense of health conditions just upload medical documents to ChatGPT for GPT-5 to figure out. What was it Altman was just saying about feeding ChatGPT sensitive information?

OpenAI tunes out the voices

While GPT-5's benchmark gains were marginal at best, the models should be less prone to hallucinating, which has become a major problem with models fabricating often convincing information in order to satisfy a user's request. In our tests [11]just this week , OpenAI's (much smaller and less capable) open-source models hallucinated a fictional presidential candidate whom Donald Trump beat in 2024.

"GPT-5's responses are around 45 percent less likely to contain a factual error than GPT-4o and when thinking GPT-5's responses are around 80 percent less likely to contain a factual error than OpenAI o3," the company said in a blog post.

Along with cutting down on hallucinations, OpenAI also implemented evaluations to test for deceitful behavior on the models' part.

"In order to achieve a high reward during training, reasoning models may learn to lie about successfully completing a task or be overly confident about an uncertain answer," the company explained. "GPT-5 more accurately recognizes when tasks can't be completed and communicates its limits clearly."

In testing on real-world chat data, OpenAI says it was able to reduce deception rates from 4.8 percent on o3 to 2.1 percent in reasoning responses.

Meanwhile, on the topic of safety, OpenAI has implemented new measures to handle potentially dubious prompts on sensitive topics. Rather than guardrails that can be bypassed with clever prompt engineering, the model says GPT-5 will now provide the most complete response possible while staying within an acceptable safety margin.

For example, instead of refusing to answer a question about how to ignite a potentially explosive compound, the model might instead direct the user to where they can find the information and issue warnings in response to the request.

[12]OpenAI's new model can't believe that Trump is back in office

[13]How to run OpenAI's new gpt-oss-20b LLM on your computer

[14]Google, OpenAI, Anthropic get blanket deal to saturate US government with their AI

[15]OpenAI makes good on its name, launches first open weights language models since GPT-2

ChatGPT gets a personality or four

Alongside the new models, OpenAI is also rolling out four new optional personalities for its chatbot so users can decide exactly how professional or edgy they want their AI assistant to be.

At launch, four personalities will be available: cynic, robot, listener, and nerd. These personalities, the model builder notes, are opt-in and are, for the moment, limited to text chat with distinct voice capabilities coming later.

"This lets you interact with ChatGPT in a way that's consistent with your own communication style," Mark Chen, Chief Research Officer at OpenAI, said.

OpenAI was careful to emphasize that these personalities have been specifically tuned to avoid becoming too sycophantic in their praise of user questions and inputs.

Availability

OpenAI's GPT-5 family of models is available now on ChatGPT for free, Plus, and Pro users beginning today and will be rolling out to enterprise and educational users next week.

Pricing for ChatGPT remains unchanged at $20 a month for the Plus tier and $200 a month for the unlimited Pro tier.

Professionals also have the option of accessing the models via API. Full pricing, including cost per input, output and cached tokens can be found [16]here .

If the idea of paying for ChatGPT doesn't appeal to you, earlier this week, OpenAI [17]released its first open weights models since GPT-2.

Bootnote:

This week also saw the [18]release of Anthropic's Claude Opus 4.1, an updated version of the model which showed similarly iterative improvements in coding benchmarks. ®

Get our [19]Tech Resources



[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aJV2VxQsUo37S8glt1sghgAAAMs&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[2] https://www.youtube.com/watch?v=0Uu_VJeVVfo

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aJV2VxQsUo37S8glt1sghgAAAMs&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aJV2VxQsUo37S8glt1sghgAAAMs&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aJV2VxQsUo37S8glt1sghgAAAMs&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[6] https://openai.com/index/introducing-gpt-5/

[7] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aJV2VxQsUo37S8glt1sghgAAAMs&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[8] https://regmedia.co.uk/2025/08/07/gpt_5_performance.jpg

[9] https://regmedia.co.uk/2025/08/07/gpt_5_performance_gpqa_diamond.jpg

[10] https://regmedia.co.uk/2025/08/07/gpt_5_tau2-bench.jpg

[11] https://www.theregister.com/2025/08/06/openai_model_election_disinformation/

[12] https://www.theregister.com/2025/08/06/openai_model_election_disinformation/

[13] https://www.theregister.com/2025/08/07/run_openai_gpt_oss_locally/

[14] https://www.theregister.com/2025/08/06/google_openai_anthropic_us_gov_ai_deal/

[15] https://www.theregister.com/2025/08/05/openai_open_gpt/

[16] https://platform.openai.com/docs/pricing?latest-pricing=standard

[17] https://www.theregister.com/2025/08/05/openai_open_gpt/

[18] https://www.anthropic.com/news/claude-opus-4-1

[19] https://whitepapers.theregister.com/



beast666

Fractionally outperforms the NATO SecGen then.

GPT-5: the illusion of understanding

Taliesinawen

A.Is such as GPT-5 excels at advanced pattern recognition, processing vast text datasets to predict and generate responses based on prompts. It mimics understanding by analyzing statistical relationships between words and phrases, replicating patterns without truly comprehending their meaning, context, or concepts.

Re: GPT-5: the illusion of understanding

HuBo

Yeah, and it should be great for "health", from "day-to-day care advice" to whole hog "life saving diagnosis", "the best model ever for health", to control "your healthcare journey", with self-diagnosis and self-prescription!

Heck, makes one wonder why they didn't give it a fifth "personality" of expert-MD, to interact with it "in a way that's consistent with your own communication style" ... like: "hey doc GPT-5, me takes too much chloride, what can me do?", to which expert-MD-personality ChatGPT-5 would reply: "just replace it [1]with bromide you fule".

That's Artificially Sentient Superintelligence right there! (ASS)

[1] https://arstechnica.com/health/2025/08/after-using-chatgpt-man-swaps-his-salt-for-sodium-bromide-and-suffers-psychosis/

Sanity is the trademark of a weak mind.
-- Mark Harrold