OpenAI's GPT-5 looks less like AI evolution and more like cost cutting
- Reference: 1755110587
- News link: https://www.theregister.co.uk/2025/08/13/gpt_5_cost_cutting/
- Source link:
As the flag bearer that kicked off the generative AI era, OpenAI is under considerable pressure not only to demonstrate technological advances, but also to justify its massive, multi-billion-dollar funding rounds by showing its business is growing.
To do that, OpenAI can either increase its user base, raise prices, or cut costs. Much of the industry is already aligning around its $20 and $200 a month pricing tiers. So OpenAI would need to offer something others cannot to justify a premium, or risk losing customers to competitors such as Anthropic or Google.
[1]
With the academic year about to kick off, OpenAI is sure to pick up a fresh round of subscriptions as students file back into classrooms following the summer break. While more paying customers will mean more revenues, it also means higher compute costs.
[2]
[3]
Enter the cost-cutting era.
Perhaps the best evidence of cost-cutting is the fact that GPT-5 isn't actually [4]one model . It's a collection of at least two models: a lightweight LLM that can quickly respond to most requests and a heavier duty one designed to tackle more complex topics. Which model prompts land in is determined by a router model, which acts a bit like an intelligent load balancer for the platform as a whole. Image prompts use a completely different model, Image Gen 4o.
[5]
This is a departure from how OpenAI has operated in the past. Previously, Plus and Pro users have been able to choose which model they'd like to use. If you wanted to ask o3 mundane questions that GPT-4o could have easily handled, you could.
In theory, OpenAI's router model should allow the bulk of GPT-5's traffic to be served by its smaller, less resource-intensive models.
We can see more evidence of cost-cutting in OpenAI's decision to toggle reasoning on and off by default automatically, depending on the complexity of the prompt. Freeloaders... we mean free tier users, don't have the ability to toggle this on themselves. The less reasoning the models are doing, the fewer tokens they generate and the less expensive they are to operate.
[6]
But while this approach may be smarter for OpenAI's bottom line, it doesn't seem to have made the models themselves all that much smarter. As we addressed in our launch day [7]coverage , OpenAI's benchmarks show rather modest gains compared to prior models. The biggest improvements were in tool calling and curbing hallucinations.
[8]
Your eyes aren't deceiving you, GPT-5 shows only iterative improvements in math benchmarks like AIME 2025 - Click to enlarge
The new system depends on the routing model to redirect prompts to the right language model, which, based on early feedback, hasn't been going all that well for OpenAI. According to Altman, on launch day, GPT-5's routing functionality was [9]broken , which made the model seem "way dumber" than it actually is.
Presumably this is why GPT-5 thought that "Blueberry" has just one B. Now it appears that OpenAI has fixed that rather embarrassing mistake.
But since GPT-5's router is a separate model, the company can, at least, improve it.
Deprecating models
The router model isn't OpenAI's only cost-cutting measure. During the AI behemoth's launch event last week, execs revealed that they were so confident in GPT-5 that they were deprecating all prior models.
That didn't go over great with users, and CEO Sam Altman later [10]admitted that OpenAI made a mistake when it elected to remove models like GPT-4o, which, despite its lack of reasoning capability and generally poorer performance in benchmarks, is apparently quite popular with end users and enterprises.
"If you have been following the GPT-5 rollout, one thing you might be noticing is how much of an attachment some people have to specific AI models. It feels different and stronger than the kinds of attachment people have had to previous kinds of technology (and so suddenly deprecating old models that users depended on in their workflows was a mistake)," he wrote.
Nonetheless, fewer models to wrangle means more resources to go around.
OpenAI doesn't disclose much technical detail about its internal (non open-source) models, but if GPT-5 is anything like the dev's open-weights models, gpt-oss-20b and gpt-oss-120b, and it was quantized to MXFP4, OpenAI has good reason for wanting all those legacy GPTs gone.
As we recently [11]explored , the data type can reduce the memory, bandwidth, and compute required by LLMs by up to 75 percent compared to using BF16.
For now, OpenAI restored GPT-4o for paying users, but we have no doubt that, once OpenAI figures out what makes the model so endearing and how they can apply it to GPT-5, they'll do just that.
Lack of context
In addition to architectural changes, OpenAI opted not to increase GPT-5's context window, which you can think of as its long-term memory. Free users are still limited to an 8,000-token context while Plus and Pro users cap out at 128,000 tokens.
Compare that to Claude's Pro plan, which Anthropic prices similarly to OpenAI's Plus subscription, and which offers a 200,000 token context window. Google's Gemini supports contexts up to 1 million tokens.
Larger contexts are great for searching or summarizing large volumes of text, but they also require vast amounts of memory. By sticking with smaller contexts, OpenAI can get by running its models on fewer GPUs.
If OpenAI's claims about GPT-5 hallucinating up to 80 percent less than prior models are true, then we expect users to want larger context windows for document search.
With that said, if long contexts are important to you, the version of GPT-5 available via OpenAI's API supports context windows up to 400,000 tokens, but you'll be paying a pretty penny if you actually want to take advantage of it.
Filling the context just once on GPT-5 will set you back about 50 cents USD, which can add up quickly if you plan to throw large documents at the model consistently.
Altman waves his hands
Altman has been doing a fair bit of damage control in the days since GPT-5's debut.
In addition to bringing GPT-4o back, paid users [12]can now select and adjust GPT-5's response speed among Auto, Fast, and Thinking. He's also boosted rate limits to 3,000 messages per week.
On Monday, Altman [13]laid out OpenAI's strategy for allocating compute over the next few months, which will unsurprisingly prioritize paying customers.
[14]OpenAI's GPT-5 is here with up to 80% fewer hallucinations
[15]How OpenAI used a new data type to cut inference costs by 75%
[16]'Suddenly deprecating old models' users depended on a 'mistake,' admits OpenAI's Altman
[17]GPT-5 is going so well for OpenAI that there's now a 'show additional models' switch
Once ChatGPT's customers get their resources, Altman says, API use will take precedence at least up to the current allotted capacity. "For a rough sense, we can support about an additional ~30% new API growth from where we are today with this capacity," he wrote in an X post.
Only then will OpenAI look at improving the quality of ChatGPT's free tier or expanding API capacity. But worry not, if Altman is to be believed, OpenAI will have twice the compute to play with by the end of the year.
"We are doubling our compute fleet over the next 5 months (!) so this situation should get better," he wrote. ®
Get our [18]Tech Resources
[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aJ0K9D419fmMafz2_HOjfgAAAA0&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aJ0K9D419fmMafz2_HOjfgAAAA0&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aJ0K9D419fmMafz2_HOjfgAAAA0&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[4] https://www.theregister.com/2025/08/07/openai_gpt_5/
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aJ0K9D419fmMafz2_HOjfgAAAA0&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aJ0K9D419fmMafz2_HOjfgAAAA0&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[7] https://www.theregister.com/2025/08/07/openai_gpt_5/
[8] https://regmedia.co.uk/2025/08/07/gpt_5_performance.jpg
[9] https://x.com/sama/status/1953893841381273969
[10] https://www.theregister.com/2025/08/11/openai_tweaks_gpt5_after_user/
[11] https://www.theregister.com/2025/08/10/openai_mxfp4/
[12] https://www.theregister.com/2025/08/13/gpt5_updated_again/
[13] https://x.com/sama/status/1955077002945585333
[14] https://www.theregister.com/2025/08/07/openai_gpt_5/
[15] https://www.theregister.com/2025/08/10/openai_mxfp4/
[16] https://www.theregister.com/2025/08/11/openai_tweaks_gpt5_after_user/
[17] https://www.theregister.com/2025/08/13/gpt5_updated_again/
[18] https://whitepapers.theregister.com/
Re: Fasscinnating Bluebberries
You mean two?
Re: Fasscinnating Bluebberries
There are 5 lights!
Re: Fasscinnating Bluebberries
Oooh !!! ... nothing like a Star Trek meme to make it all clear !!! :)
P.S.
I got it ... very good !!!
You win 1000 strips of gold-pressed latinum ... but you much collect in person from Ferenginar within the next 48 hours.
Good luck !!!
:)
So Sam...
You lying sack of shit....how is this "super intelligence" going to happen if your latest multi-billion dollar model is only fractionally better than the last?
Oh wait it can't, because your a insipid snake oil salesman.
Re: So Sam...
I think he knows it too. Every once in a while he comes out and downplays the capabilities of ChatGPT. When ChatGPT 3 came out people accused them of downgrading it. He said it wasn't a downgrade, people had just started using it for uses other than gimmicks and were realising how limited it actually was.
Edit: Here's another one, this time about ChatGPT 4. In this case it's more about shilling ChatGPT 5 but he keeps hyping up and tamping down, hyping up and tamping down.
https://gizmodo.com/sam-altman-also-thinks-chatgpt-kinda-sucks-openai-1851345956
For a moment there the environment was relieved at a few less trees being incinerated to generate TikTok video effects.
Consistency
"For now, OpenAI restored GPT-4o for paying users, but we have no doubt that, once OpenAI figures out what makes the model so endearing and how they can apply it to GPT-5, they'll do just that."
I would imagine it's simply that if people are now using these models to actually do real work (as they'd like people to do), they don't want the models behaviour to change every 5 fucking minutes.
The biggest improvements were in tool calling and curbing hallucinations.
I'm looking to create custom signing keys for for Secure Boot with Arch, so I've been reading through the Arch wiki. I wanted to see how ChatGPT would condense the process, rather pointedly hinting at the Arch wiki. Not only did it not pull any info from the Arch wiki, it created a long series of instructions to use long-deprecated tools that it thinks I should install with apt.
Re: The biggest improvements were in tool calling and curbing hallucinations.
My experience has been similar. When it involves anything complex with software, the AI solution tends, more often than not, to be either wrong or else a very poor and complicated way of doing things.
I've also seen it recommend bypassing apt and using dpkg directly, which is not something that you should do unless you really know what you are doing. People blindly following advice from AI are going to get themselves into a lot of trouble.
I have however found it to be useful in terms of providing examples of how to use various bash commands. The results are not always right, but they tend to be things where either it works or it doesn't and I can read the man pages to see if the AI's suggestion looks reasonable (or if the options it suggests even exist).
I'm pretty sure that I wouldn't pay any money for that however. And that's the problem that LLM AI companies are going to have, which is how to get people to fork over the many billions to pay for the data centres and entire new electric generating plants to power them.
Even if the advance of technology and human experience makes LLM AIs useful someday, investors are likely to lose patience before that day comes and pull the plug on the current companies.
This is just the calm before the AGI storm, shirley!
I mean, all those highly paid analysts and investors can't be wrong, can they?
Let me ask ChatGPT about that...
Fasscinnating Bluebberries
Wasn’t it three Bs that it was adamant there were in the word blueberry, not one?