GPT-5.2 Arrives as OpenAI Scrambles To Respond To Gemini 3's Gains (openai.com)
- Reference: 0180363115
- News link: https://slashdot.org/story/25/12/11/1844246/gpt-52-arrives-as-openai-scrambles-to-respond-to-gemini-3s-gains
- Source link: https://openai.com/index/introducing-gpt-5-2/
OpenAI says the Thinking model hallucinated 38% less than GPT-5.1 on benchmarks measuring factual accuracy. Fidji Simo, OpenAI's CEO of applications, denied that the launch was moved up in response to the code red, saying the company has been working on GPT-5.2 for "many, many months." She described the internal directive as a way to "really signal to the company that we want to marshal resources in this one particular area."
The competitive pressure is real. Google's Gemini app now has more than 650 million monthly active users, compared to OpenAI's 800 million weekly active users. In October, OpenAI's head of ChatGPT Nick Turley sent an internal memo declaring the company was facing "the greatest competitive pressure we've ever seen," setting a goal to increase daily active users by 5 percent before 2026. GPT-5.2 is rolling out to paid ChatGPT users starting Thursday, and GPT-5.1 will remain available under "legacy models" for three months before being sunset.
[1] https://openai.com/index/introducing-gpt-5-2/
[2] https://tech.slashdot.org/story/25/12/02/2221238/openai-declares-code-red-as-google-catches-up-in-ai-race
[3] https://tech.slashdot.org/story/25/11/18/1634253/google-launches-gemini-3-its-most-intelligent-ai-model-yet
"Now with 38% FEWER hallucinations!" (Score:3)
That's not actually the flex you think it is...
Re: (Score:2)
Like, would you consider your girlfriend having 38% fewer hallucinations to be a big win? (I once had a girlfriend call me up while she was experiencing delirium tremens and describe to me how demons were raping her mom. I told her she was hallucinating. She insisted it was real, she could see it!)
Re: (Score:2)
The ideal number of hallucinations is zero unless they are specifically requested for whatever reason. If someone told you they were going to kick you in the nuts 38% fewer times this week, you're still getting kicked in the nuts.
I'm not sure a person who's hallucinating could be convinced by another person that what they observe isn't really happening. I think a person has to come to that realization themselves in order to be able to not lose their shit.
Re: (Score:1, Troll)
How on Earth could it not be?
If you had 38% less tumor load on your cancer, would that be a good thing to you?
They're not trying to deny that the things hallucinate. You are trying to deny that an improvement is.... an improvement.
"Checkmate, LLM-tards!"
Re: (Score:2)
If you told me that the latest model of your parachute fails to open 38% less often, I still don't want to take it skydiving. The failure rate of LLMs is so high that any productivity gains you get generating some output are lost verifying that output.
Re: (Score:1, Troll)
I agree- you should probably not use your LLM as a parachute, or other things where any hallucination has an equivalent outcome to a parachute not opening.
That was not an intelligent argument. Try again.
Re: (Score:1)
Meh, people hallucinate on their responses just as much. That and its 38% less than 5.1, how about just tell us what the total hallucination is per model?
Re: (Score:2)
Ya, the percentage improvement is fucking annoying.
Using the dipshit's parachute model, every single parachute in existence has a probability of not opening.
38% reduction of that is universally good.
Whether or not that probability is 50%, or 0.00005% is what matters.
Re: "Now with 38% FEWER hallucinations!" (Score:2)
... yet you'd easily take a coworker that screwed up 38% less often.
I'm not sure what the point of comparing to a parachute is, when you don't have a reason to control a parachute with a computer in the first place.
Re: (Score:2)
> I'm not sure what the point of comparing to a parachute is, when you don't have a reason to control a parachute with a computer in the first place.
And if you do, because you're NASA or some other space agency, may I recommend not using an LLM for that purpose.
Of course since you don't want to either A) have your LLM function as an parachute (very bad!), or B) have your parachute controlled by an LLM, certainly that means they're useful for everything, right?
My wife once told me that she couldn't put the TV up on the wall without help. Divorced her on the spot.
Re: (Score:2)
I don't understand how AI hallucinates for most people. I haven't had that problem once I learned/figured out how to ask it stuff properly like maybe 2 years ago or something. You're asking it to do too much shit, not validate itself, and not being specific or algorithmic enough. Watch some YouTube videos on AI prompting or something.
Re: (Score:2)
If you think LLMs do not hallucinate, then you are very much filling your head full of LLM hallucinations. That's not great.
You aren't wrong that prompting can improve the situation. However, even in the most rigorously defined tasks, they hallucinate.
Hell, the fucking things even hallucinate tool calls sometimes (amusing the first time you run into that in your agentic code)
Re: (Score:1, Flamebait)
Nonsense. These things are useless. Only fools buy them. They produce nothing but hallucinations, and can't generate code past a negative 5 year old level.
Trust me, I read that on slashdot.
Re: (Score:2)
I think you should perhaps read better before commenting.
No argument existed in that post. What did exist were 4 unsubstantiated claims, with the final one being obvious satire (unless we accept the premise that humans can generate code 4 years before they're conceived).
Re: my 2c (Score:1)
Am I the only one thinking of Monty Python's Argument Sketch?
---
Man: (Knock)
Mr. Vibrating: Come in.
Man: Ah, Is this the right room for an argument?
Mr. Vibrating: I told you once.
Man: No you haven't.
Mr. Vibrating: Yes I have.
Man: When?
Mr. Vibrating: Just now.
Man: No you didn't.
Mr. Vibrating: Yes I did.
Man: You didn't
Mr. Vibrating: I did!
Man: You didn't!
Mr. Vibrating: I'm telling you I did!
Man: You did not!!
Mr. Vibrating: Oh, I'm sorry, just one moment. Is this a five minute argument or the full half hour? ..
Re: (Score:2)
I dunno. They make decent enough output for shitposting on social media. While there is a certain amount of delight to be had in coming up with a clever limerick about someone's mother, some people really aren't worth the effort. The AI can do it well enough in a few seconds though.
I'm not sure I'd use it for any productive work though. Of course not everything has to be for work though either.
Re: my 2c (Score:1)
Do you think the military will, and if it hallucinates an enemy combatant that's just being efficient?
Re: (Score:2)
I was being sarcastic.
I use it for productive work right now.
So do many people.
The industry for people using it for work is already measured in billions of dollars per year in revenue.
If you don't- that's just fine.
> Of course not everything has to be for work though either.
This is your key insight. I'd add DamnOregonian's corollary to it, though: Not all work requires a model capable of producing superintelligent output. Sometimes-bordering-on-kinda-dumb-but-also-freakishly-skilled-in-certain-ways is also perfectly sufficient at times.
Another relase (Score:2)
So which one is closer to LCARS now? None of them is remotely "intelligent" at all, so we're just looking for the one with the best catalog library interface.
Not that any of the LLM's could match the artificial sentience of the Librarian..
Re: (Score:2)
I don't mind a better search engine.
Artificial Idiocy (Score:4, Funny)
> Google's Gemini app now has more than 650 million monthly active users, compared to OpenAI's 800 million weekly active users.
Apparently not intelligent enough to convert numbers into the same unit before comparing them.
Re: (Score:2)
>> Google's Gemini app now has more than 650 million monthly active users, compared to OpenAI's 800 million weekly active users.
> Apparently not intelligent enough to convert numbers into the same unit before comparing them.
You can't convert the numbers. To illustrate - having 800 million weekly active users does not mean they have more than 3.2 billion monthly active users. Many of these users would be counted multiple times if "converting" that way.
Anyone Ask AI How to Destroy AI (Score:2)
Perhaps it could be useful in cleaning up the mess it makes.
Intriguing. (Score:2)
I still can't get ChatGPT, Gemini, or Claude to write a decent story or do an engineering design beyond basic complexity. They're all improving, but they're best thought of as brain-storming aids rather than actual development tools.
MORE Money burned (Score:2)
More and more money inflating the AI bubble.
And they are just burning though the money like there's no tomorrow.
AI is being tossed into consumer items, that NO ONE ASKED FOR
SO much AI slop is now in advertising, You tube is full of "AI girlfriend" and other crap I now just assume EVERY advert is just an AI scam.
And then there is the waste time I need to spend turning all the AI junk OFF.
Too late (Score:2)
Too late I switched to Grok and got far better results and faster. ChatGPT is too far behind to catch up at this point I think.
Competition is good (Score:3)
Sometimes markets do operate well. Sometimes.
Re: Competition is good (Score:1)
Why shouldn't google buy openai?
Re: Competition is good (Score:2)
> Why shouldn't google buy openai?
You're just trying to be provocative. You have to spend money to make money, but OpenAI is already spending other people's money, so Google would be buying that debt. Then it still would not secure the market for LLMs, because anyone can do it, others are doing it well enough. If you consolidate to raise prices sooner, they'll look better and better. If your plan is to loose money but outlast everyone, why would you buy their debt piles, can't wait?
I love AI like I loved the Internet in the 90s, lots of pot