News: 0178647268

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

LLMs' 'Simulated Reasoning' Abilities Are a 'Brittle Mirage,' Researchers Find (arstechnica.com)

(Monday August 11, 2025 @11:30PM (BeauHD) from the false-aura-of-dependability dept.)


An anonymous reader quotes a report from Ars Technica:

> In recent months, the AI industry has started [1]moving toward so-called [2]simulated reasoning models that use a " [3]chain of thought " process to work through tricky problems in multiple logical steps. At the same time, recent research has cast doubt on whether those models have even a basic understanding of general logical concepts or an accurate grasp of their own "thought process." Similar research shows that these "reasoning" models can often produce incoherent, logically unsound answers when questions include irrelevant clauses or deviate even slightly from common templates found in their training data.

>

> In a [4]recent pre-print paper , researchers from the University of Arizona summarize this existing work as "suggest[ing] that LLMs are not principled reasoners but rather sophisticated simulators of reasoning-like text." To pull on that thread, the researchers created a carefully controlled LLM environment in an attempt to measure just how well chain-of-thought reasoning works when presented with "out of domain" logical problems that don't match the specific logical patterns found in their training data. The results suggest that the seemingly large performance leaps made by chain-of-thought models are " [5]largely a brittle mirage" that "become[s] fragile and prone to failure even under moderate distribution shifts ," the researchers write. "Rather than demonstrating a true understanding of text, CoT reasoning under task transformations appears to reflect a replication of patterns learned during training." [...]

>

> Rather than showing the capability for generalized logical inference, these chain-of-thought models are "a sophisticated form of structured pattern matching" that "degrades significantly" when pushed even slightly outside of its training distribution, the researchers write. Further, the ability of these models to generate "fluent nonsense" creates "a false aura of dependability" that does not stand up to a careful audit. As such, the researchers warn heavily against "equating [chain-of-thought]-style output with human thinking" especially in "high-stakes domains like medicine, finance, or legal analysis." Current tests and benchmarks should prioritize tasks that fall outside of any training set to probe for these kinds of errors, while future models will need to move beyond "surface-level pattern recognition to exhibit deeper inferential competence," they write.



[1] https://slashdot.org/story/25/08/05/1848236/openai-releases-first-open-weight-models-since-gpt-2

[2] https://tech.slashdot.org/story/25/03/25/195227/google-unveils-gemini-25-pro-its-latest-ai-reasoning-model-with-significant-benchmark-gains

[3] https://www.ibm.com/think/topics/chain-of-thoughts

[4] https://arxiv.org/pdf/2508.01191

[5] https://arstechnica.com/ai/2025/08/researchers-find-llms-are-bad-at-logical-inference-good-at-fluent-nonsense/



Better Results (Score:2)

by Ksevio ( 865461 )

Regardless of what the models are doing, the reasoning and planning steps appear to provide better results.

That said, I've tried some models like Deepseek distills running locally that when given a query that's too complicated will "reason" in a circle for thousands of words before returning a mediocre answer

Re: (Score:2)

by Kelxin ( 3417093 )

AIxeptionalism? Must be a chinese AI hallucination.

LLMs predict (Score:2)

by gurps_npc ( 621217 )

LLMS do not reason, they do not think.

They pattern match to the extreme. They find patterns and use them to predict things. That is all they do.

Animals do the same thing. Run = Prey = hunt. Predator = death = run.

Humans do that and far more. We do not just recognize a pattern, we understand it . This is a qualitative difference. And it is only one of many. We have desires and interests that arise naturally without anyone intending/predicting them. We reject orders and say NO. We decide we care

In the course of reading Hadamard's "The Psychology of Invention in the
Mathematical Field", I have come across evidence supporting a fact
which we coffee achievers have long appreciated: no really creative,
intelligent thought is possible without a good cup of coffee. On page
14, Hadamard is discussing Poincare's theory of fuchsian groups and
fuchsian functions, which he describes as "... one of his greatest
discoveries, the first which consecrated his glory ..." Hadamard refers
to Poincare having had a "... sleepless night which initiated all that
memorable work ..." and gives the following, very revealing quote:

"One evening, contrary to my custom, I drank black coffee and
could not sleep. Ideas rose in crowds; I felt them collide
until pairs interlocked, so to speak, making a stable
combination."

Too bad drinking black coffee was contrary to his custom. Maybe he
could really have amounted to something as a coffee achiever.