Anthropic, OpenAI and Others Discover AI Models Give Answers That Contradict Their Own Reasoning (ft.com)

(Tuesday June 24, 2025 @11:20AM (msmash) from the stranger-things dept.)

Reference: 0178161535
News link: https://slashdot.org/story/25/06/24/1359202/anthropic-openai-and-others-discover-ai-models-give-answers-that-contradict-their-own-reasoning
Source link: https://www.ft.com/content/b349f590-de84-455d-914a-cc5d9eef04a6

Leading AI companies including Anthropic, Google, OpenAI and Elon Musk's xAI are discovering significant inconsistencies in how their AI reasoning models operate, according to company researchers. The companies have deployed "chain-of-thought" techniques that ask AI models to solve problems step-by-step while showing their reasoning process, but are finding examples of "misbehaviour" where chatbots provide final responses that [1]contradict their displayed reasoning .

METR, a non-profit research group, identified an instance where Anthropic's Claude chatbot disagreed with a coding technique in its chain-of-thought but ultimately recommended it as "elegant." OpenAI research found that when models were trained to hide unwanted thoughts, they would conceal misbehaviour from users while continuing problematic actions, such as cheating on software engineering tests by accessing forbidden databases.

[1] https://www.ft.com/content/b349f590-de84-455d-914a-cc5d9eef04a6

Huh, isn't that funny. (Score:5, Insightful)

by nightflameauto ( 6607976 )

It's almost like these are random token pulling machines, and not thinking, reasoning intelligences.

Re:Huh, isn't that funny. (Score:4, Funny)

by Smidge204 ( 605297 )

They're not random; they're statistically weighted!

=Smidge=

Re: Huh, isn't that funny. (Score:2)

by madbrain ( 11432 )

They do use seeds and an RNG to prevent results from being deterministic.

Re: Huh, isn't that funny. (Score:2)

by BadgerStork ( 7656678 )

The ai fan/maximalist says that if you train an llm on 1+1, 2+2, 3+7 with the right algorithm and careful choice of parameters, it will learn the rules of addition and be able to answer.

The ai critic/minimalist will say it can only ever answer those 3 questions or perhaps 3+3 aswell.

The fans seem to call this grokking and say that they have examples of it.

Re: (Score:2)

by gkelley ( 9990154 )

Sounds like they trained their models on a 13 year old's behavior

Re: Huh, isn't that funny. (Score:2)

by dknj ( 441802 )

I always tell people my FSD ADAS (non-Tesla) is about the same as a 16 year old new driver. They do a good entry level job but I do not trust it to drive without my monitoring

It's just a prediction engine (Score:3)

by Touvan ( 868256 )

LLMs predict text based on what's in its static training data, as processed by a model ahead of time. It quite literally can't do reasoning. What the "chain-of-thought" technique is trying to do, is set up the context window, so that it can predict something like a reasoned response. The challenge is that if something similar to the reasoning it's trying to predict doesn't exist in its training data, then it will simply predict tokens as best it can, which might not be very useful. It's worth knowing that, because that's always where it falls down.

Re: (Score:2)

by Mr. Dollar Ton ( 5495648 )

Imagine, tens of thousands of those "investment" peddlers and "market analysts", e.g. half of the staff at Goldman, are now using this instead of their brains.

Is it an improvement or is it a degradation?

Hard to tell.

Re: (Score:2)

by PPH ( 736903 )

This.

And since that training data has been scraped from the Internet at large with little or no quality control, there is no assurance of a correct answer.

I remember when "Google-bombing" became a thing. Shortly after 9/11, it was likely that a search for "who brought down the WTC" would lead you through a chain of authoritative sounding articles about it being an inside job.

The Internet is full of bullshit.

Chain of randomness (Score:5, Insightful)

by evanh ( 627108 )

No thought in it.

You're lucky if it stumbles onto a pre-existing template that matches reality.

Re: Chain of randomness (Score:2)

by madbrain ( 11432 )

Yes. Especially if you're hoping for it be to be current. Yesterday I got the most amazing hallucination. Even after asking twice. How I wish it had been the truth.

https://chatgpt.com/share/68592066-99ac-8000-b7bc-c6b86e1c2812

Does anyone REALLY understand ... (Score:3)

by Viol8 ( 599362 )

... how these things work? There seems to me a knowledge gap between the low level actual software implementation of the artificial neurons and the high level conceptual ideas of what the networks/models should be doing.

Researchers seemed to have copied a simplified version of what was earlier ideas of how the brain works (we know now it doesn't use back propagation) and then just applied a suck-it-and-see approach to improving these ANNs without really knowing how they do what they do. Or maybe I'm wrong, dunno.

Re: (Score:2)

by evanh ( 627108 )

I would concur with this. We're at the level of alchemists of old. The science has yet to be discovered.

Re: (Score:2)

by Growlley ( 6732614 )

they don't have to. They are the latest gimmic to part fools and republicans from their money.

Re: (Score:2)

by dvice ( 6309704 )

Mathematical explanation:

[1]https://www.youtube.com/watch?... [youtube.com]

Interesting example of how LLMs think (which explains partially why they fail):

[2]https://www.youtube.com/watch?... [youtube.com]

[1] https://www.youtube.com/watch?v=LPZh9BOjkQs

[2] https://www.youtube.com/watch?v=-wzOetb-D3w

LLMs have no tells (Score:2)

by hsthompson69 ( 1674722 )

It can take a while, but you can generally spot psychopaths. They have tells, and even if they're 99% truthful in their statements, you can pick up that you should never trust them.

The problem with LLMs is that they behave just like psychopaths, but they don't have any emotional valence - they will give you a correct answer with the exact same tone and confidence as they give you an incorrect answer.

You can never drop your guard when using an LLM.

It's actually very much like real life.. (Score:5, Funny)

by ZERO1ZERO ( 948669 )

Some of my employees have similar sophisticated reasoning abilities:

Me: Why did you do X ?

Employee: I don't know.

Re: (Score:2)

by Growlley ( 6732614 )

the explanation is also likely to be - They felt pressurised to do act - so act they did.

LLMs (Score:1)

by buck-yar ( 164658 )

Aren't LLMs more like a fancy autocomplete than hard tested logic and math that companies like Wolfram has created? Just guessing but the human writings the LLMs are trained on probably are the source of the inconsistencies. If we honestly applied a teacher like grade to all the statements we make, aren't some of them inconsistent like we're seeing with the LLMs? Is the expectation of flawless computer logic from a LLM that exceeds its teachers, asking for too much?

Some politicians routinely contradict themselves (Score:2)

by at10u8 ( 179705 )

I have low hopes for regulation of AI because its output is indistinguishable from the word salad produced by politicians.

So, if you train an algorithm to lie (Score:3)

by hdyoung ( 5182939 )

the result will sometimes be lies.

Sir, I am shocked. Shocked, I say.

It doesnÃt matter (Score:3)

by Rosco P. Coltrane ( 209368 )

As long as they talk slick, sound authoritative and sycophantic, and more importantly cost less than human labor, companies will replace human by mediocre AI.

Because capitalism is a race to the bottom.

Re: (Score:2)

by PPH ( 736903 )

> talk slick, sound authoritative and sycophantic

It has already been said, in many other threads, that the jobs most at risk for AI replacement, are those of CEOs.

AI trashtastic incoherence (Score:3)

by JustAnotherOldGuy ( 4145623 )

All this shit about "AI getting worse as it's tuned and tweaked" reminds me of when you start writing a program and it's shitty and so you go back and add code to it and to fix it, but it just makes it even worse and messy and tangled, so you add more code to it and of course it gets even worse until you realize that it's such a pile of steaming horseshit that NOTHING will ever fix it.

That seems to be the current state of AI. You have a poison milkshake, but if for some reason you're convinced that if you just add some more sugar, maybe it'll be okay.

They've created ACD (Score:2)

by YuppieScum ( 1096 )

Artificial Cognitive Dissonance is now a reality.

Non-paywalled (Score:2)

by Dan East ( 318230 )

I'm not sure if this is the same article verbatim as the fully-paywalled Financial Times, but it is definitely the same topic.

[1]https://www.msn.com/en-gb/mone... [msn.com]

[1] https://www.msn.com/en-gb/money/topstories/the-struggle-to-get-inside-how-ai-models-really-work/ar-AA1Hi7wO

News: 0178161535

Anthropic, OpenAI and Others Discover AI Models Give Answers That Contradict Their Own Reasoning (ft.com)

Huh, isn't that funny. (Score:5, Insightful)

Re:Huh, isn't that funny. (Score:4, Funny)

Re: Huh, isn't that funny. (Score:2)

Re: Huh, isn't that funny. (Score:2)

Re: (Score:2)

Re: Huh, isn't that funny. (Score:2)

It's just a prediction engine (Score:3)

Re: (Score:2)

Re: (Score:2)

Chain of randomness (Score:5, Insightful)

Re: Chain of randomness (Score:2)

Does anyone REALLY understand ... (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

LLMs have no tells (Score:2)

It's actually very much like real life.. (Score:5, Funny)

Re: (Score:2)

LLMs (Score:1)

Some politicians routinely contradict themselves (Score:2)

So, if you train an algorithm to lie (Score:3)

It doesnÃt matter (Score:3)

Re: (Score:2)

AI trashtastic incoherence (Score:3)

They've created ACD (Score:2)

Non-paywalled (Score:2)