Researchers Warn Against Treating AI Outputs as Human-Like Reasoning

(Thursday May 29, 2025 @11:28AM (msmash) from the shocker-right? dept.)

Reference: 0177852085
News link: https://tech.slashdot.org/story/25/05/29/1411236/researchers-warn-against-treating-ai-outputs-as-human-like-reasoning
Source link:

Arizona State University researchers are [1]pushing back [PDF] against the widespread practice of describing AI language models' intermediate text generation as "reasoning" or "thinking," arguing this anthropomorphization creates dangerous misconceptions about how these systems actually work. The research team, led by Subbarao Kambhampati, examined recent "reasoning" models like DeepSeek's R1, which generate lengthy intermediate token sequences before providing final answers to complex problems. Though these models show improved performance and their intermediate outputs often resemble human scratch work, the researchers found little evidence that these tokens represent genuine reasoning processes.

Crucially, the analysis also revealed that models trained on incorrect or semantically meaningless intermediate traces can still maintain or even improve performance compared to those trained on correct reasoning steps. The researchers tested this by training models on deliberately corrupted algorithmic traces and found sustained improvements despite the semantic noise. The paper warns that treating these intermediate outputs as interpretable reasoning traces engenders false confidence in AI capabilities and may mislead both researchers and users about the systems' actual problem-solving mechanisms.

[1] https://arxiv.org/pdf/2504.09762

Duh! (Score:5, Informative)

by srg33 ( 1095679 )

Many people, including me, having been saying this over and over again. Calling this stuff AI is just not correct. The outputs based on LLM are just unqualified/unverified correlations. Correlation does not equal causation!

Re: (Score:3)

by drinkypoo ( 153816 )

Correct. Since it's not thinking, it's not artificial intelligence. It only looks like it's thinking. This is Simulated Intelligence.

If you are any good at thinking, most of it doesn't look like it's thinking, but it fools plenty of people who aren't. Also, some of the newer LLMs give a really good imitation of thought, because they explain their "logic". But it's ultimately just feeding back on itself in order to do that and it's not-thinking all the way down.

Re: (Score:2)

by allo ( 1728082 )

Artificial intelligence is the superset, which includes a lot less "intelligent" things than LLM.

ELIZA is AI. Expert systems are AI. Markov Chain text generators are AI.

You just should understand the term as it is scientifically used and not like it is used in sci-fi.

Re: (Score:2)

by jhoegl ( 638955 )

If you are any good at thinking I mean, thats probably a 33-66% likelihood judging by todays sampling of people on the internet. (Maybe 90% of those are bots, but the people who can tell us that, wont).

So now that you understand the companies running these single answer to a search engine question UI sites (fake AI), then you understand their end goal. Manipulate, manipulate, manipulate. The answer will differ by next year depending on who owns the interface. We already see tons of examples of t

You sure? (Score:2)

by neoshroom ( 324937 )

> Correct. Since it's not thinking, it's not artificial intelligence. It only looks like it's thinking. This is Simulated Intelligence.

That's quite a dangerious view once someone starts applying it to you.

Re: (Score:2)

by avandesande ( 143899 )

Really though with the internet a majority of output from humans is exactly the same- regurgitating anecdotes or facts they have read somewhere else with little or no understanding.

Re: (Score:1)

by buck-yar ( 164658 )

And its important to remember that the next token the LLM generates is what it is because that was the most common or popular response in its training material. Not necessarily the right answer. Logically, truth of statements is not able to be determined, unless something is logically always true. For example, an if x then y statement is always true regardless whether the components are true. But a simple statement like "the sky is blue" is not able to be determined to be true (in logic at least).

Consider

Re: (Score:2)

by Koen Lefever ( 2543028 )

> For example, an if x then y statement is always true regardless whether the components are true.

Nonsense. If x is true and y is false, then "if x then y" is false.

Re: (Score:2)

by HiThere ( 15173 )

That's true in most symbolic logic systems, but not all of them.

E.g. Bayesian systems don't actually have true or false, but only degrees of probability. Additionally some systems demand (or at least attempt to demand) a causal connection between x and y for "if x then y" to even have a meaningful interpretation. Mere consistent truth values isn't sufficient.

1st order prepositional calculus isn't the only logical system.

Garbage In, Garbage Out (Score:2)

by Sloppy ( 14984 )

LLMs are expert systems, where the expertise is this: what has been written?

That's a pretty cool thing to be expert in, and it really does have some fun (possibly even useful) applications. They seem pretty good at demonstrating this expertise, but I guess a lot of people forget GIGO is a fundamental property of "what has been written?" until you point out that a lot of crap has been written. (Shitposters know the megacodex of human writing contains a lot of crap, because we've knowingly contributed our bes

Re:Garbage In, Garbage Out (Score:4, Insightful)

by srg33 ( 1095679 )

Interesting take, but I must disagree. LLMs are not expert systems. Expert systems have existed for a while now. They are made by "translating" (human) experts' real-world experience into decision trees. IIRC examples include railroad routing and (nuclear) power plant operational safety control.

Re: (Score:2)

by FictionPimp ( 712802 )

I've had constant arguments with an acquaintance about this who is using AI to deal with his emotional problems. He thinks he is speaking with something that has human level reasoning and that it really is thinking and learning from the conversations with him. Rather than it being a system of statistical patterns with a inference model to generate novel results from those patterns. Typically with a social media style engagement patter to keep you using it.

Re: (Score:2)

by HiThere ( 15173 )

Why "rather than"? That's what a whole lot of human conversation is.

OTOH, most ChatBots aren't really trained to handle emotional problems well. (Are any?) I do think it would be possible to do that, but not by scraping the internet. And they can definitely make things worse. Who was it that had to recall an AI version because its sycophantic behavior was driving people "crazy" (as in e.g., believing that they were a prophet [in the religious sense]).

Re: (Score:2)

by FictionPimp ( 712802 )

I had to show my wife this when we had a health issue with a pet. She was asking the AI to help understand if the symptoms are terminal. The AI would say yes, but she would basically prod if it could be something else. The AI would eventually say "Oh sure yea, it's not cancer" and just engagement farm after that.

What hit home for her was "Has an AI ever disagreed with you?" and it turns out that in this case, ChatGPT never did.

I think the big difference is a human has actual intellgence, especially emotiona

Bias (Score:2)

by neoshroom ( 324937 )

Very briefly logically explain why the headline "Researchers Warn Against Treating AI Outputs as Human-Like Reasoning" may overlook actual AI ability to validly reason if so, and if not why not. Do you think this headline exhibits any bias?

ChatGPT: The headline may overlook actual AI ability to validly reason because it assumes that all AI outputs lack reasoning, rather than distinguishing between shallow pattern mimicry and genuine logical processing, which some advanced models (like theorem provers or

Humans just can't help it (Score:5, Informative)

by Pollux ( 102520 )

We do this naturally without thinking. [1]It's called Pareidolia. [wikipedia.org] We recognize what appears to be a pattern of human behavior and we automatically assign a meaningful interpretation to it.

[1] https://en.wikipedia.org/wiki/Pareidolia

Re: (Score:2)

by evanh ( 627108 )

It doesn't help when the morons developing this stuff also assign terms like "intelligence" and "reasoning" to various aspects of the algorithms.

Re: (Score:2)

by HiThere ( 15173 )

Perhaps it fits the definitions they are using for those terms.

What does the word "intelligence" mean to you, specifically, that means the programs aren't intelligent?

Also, what does the word "reasoning" mean to you, specifically, that means the programs aren't reasoning?

FWIW, the original idea of logic was a formalization of the Greek grammar (of the classic period). See also "logos".

If you were to insist rather that the "intelligence" of LLMs was different, perhaps even a subset, or human intelligence, I

Not exactly algorithms. (Score:2)

by neoshroom ( 324937 )

> It doesn't help when the morons developing this stuff also assign terms like "intelligence" and "reasoning" to various aspects of the algorithms.

Very very briefly does this comment misunderstand AI reliance on the non-algorithmic?

ChatGPT: Yes, the comment misunderstands AI’s nature. It implies AI involves only straightforward algorithms, overlooking that AI systems like neural networks exhibit complex, non-explicit, emergent behaviors not easily reducible to traditional algorithms.

People are generally deranged morons... (Score:2)

by gweihir ( 88907 )

Hence warnings like this one are not going to accomplish anything. If people were somewhat smart opn average, we would not even have the current LLM hype. The only way to end this stupid-fest is to let it burn out.

Why stop there? (Score:2)

by buck-yar ( 164658 )

How about a warning to have skeptism and not treat any (including human) reasoning as always right? Too many people in society worship authorities and so called experts and think they have superior knowledge and are superheros when they're really just like everyone else. Education system is supposed to teach us critical thinking, so we can make up our own minds. Debate and disagreement should be encouraged. Ideas should be challenged. It'd be better if people wanted to argue with AI, rather than parrot it b

Re: (Score:2)

by nevermindme ( 912672 )

The argument from authority is an informal fallacy, and obtaining knowledge in this way is fallible. -- Wikipedia, the primary source for 99.999% training materials.

The failure of wikipedia is that there is no non disputable truthiness meter to each article only submitted by non active editors. Pluto article wikipeda is 95%, Transgender article is a 35%, Modern Conservatism a 5%.

That there is real money in duplication of the same content over and over without attribution tracking is the epic trip h

Re: (Score:2)

by allo ( 1728082 )

"Wikipedia, the primary source for 99.999% training materials. "

Size of the english Wikipedia (compressed): 24 GB

Size of Commoncrawl: 386 TB

Size of proprietary datasets: (Unknown, but large)

Re: (Score:2)

by nightflameauto ( 6607976 )

> How about a warning to have skeptism and not treat any (including human) reasoning as always right? Too many people in society worship authorities and so called experts and think they have superior knowledge and are superheros when they're really just like everyone else.

We see both this and the opposite, where people automatically dismiss any expert because they believe that expertise is like religion, and you have to have hard skepticism of any expert simply because they've been trained to parrot the talking points, just like the clergy in a religious center. It's good to have educated skepticism, where you can debate things based on facts and merit. It's bad to have skepticism based on, "But I don't like it," with no facts or even reality based arguments.

> Education system is supposed to teach us critical thinking, so we can make up our own minds.

Were we living i

Don't anthropomorphize AIs (Score:4, Funny)

by allo ( 1728082 )

They don't like it.

Re: (Score:2)

by Koreantoast ( 527520 )

> They don't like it.

It's insulting to them!

Your LLM Didn’t Have an “Aha” Mo (Score:2)

by rocket rancher ( 447670 )

I token, therefore

you believe that I exist.

Your mistake, not mine.

-Basho reinterprets Descartes, via Daniel Dennett.

Interesting read. I agree with their core message: don’t anthropomorphize LLMs.

Treating intermediate token sequences—like chain-of-thought outputs—as signs of thinking isn’t just sloppy metaphor. It’s actively misleading. These token chains aren’t reliable markers of reasoning. Framing them that way leads researchers, users, and even regulators down a path o

Tried GPT and Llama to write regexps... (Score:2)

by LeadGeek ( 3018497 )

Nope, and more nope. AI seems to fail pretty miserably as writing regular expressions for all but the simplest tasks. They generate really nasty and needlessly complex regexps that don't quite work and are nearly impossible to debug except through chatting about it, which generally turns into a waste of time.

AI? (Score:2)

by reanjr ( 588767 )

While you're at it, stop calling LLMs "AI". That's the most misleading part.

News: 0177852085

Researchers Warn Against Treating AI Outputs as Human-Like Reasoning

Duh! (Score:5, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

You sure? (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Garbage In, Garbage Out (Score:2)

Re:Garbage In, Garbage Out (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Bias (Score:2)

Humans just can't help it (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Not exactly algorithms. (Score:2)

People are generally deranged morons... (Score:2)

Why stop there? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Don't anthropomorphize AIs (Score:4, Funny)

Re: (Score:2)

Your LLM Didn’t Have an “Aha” Mo (Score:2)

Tried GPT and Llama to write regexps... (Score:2)

AI? (Score:2)