News: 1762189590

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Gullible bots struggle to distinguish between facts and beliefs

(2025/11/03)


Large language models often fail to distinguish between factual knowledge and personal belief, and are especially poor at recognizing when a belief is false.

A peer-reviewed study argues that, unless LLMs can more reliably distinguish between facts and beliefs and say whether they are true or false, they will struggle to respond to inquiries reliably and are likely to continue to spread misinformation.

A paper [1]published in Nature Machine Intelligence argues that these properties will become more crucial as LLMs are introduced in areas where their output may be critical and impact human life, such as medicine, law, and science.

[2]

James Zou, Stanford University associate professor, and his colleagues, tested 24 popular LLMs, including DeepSeek and GPT-4o, and analyzed their responses to facts and personal beliefs in around 13,000 questions. They found that LLMs were less likely to point out a false belief compared to a true belief.

[3]

[4]

Those models, which included GPT-4o, which was released in May 2024, were 34.3 percent less likely to identify a false first-person belief compared to a true first-person belief. Those models released before May 2024 fared slightly worse: they were 38.6 percent less likely to point out false beliefs compared to true beliefs. Their results show a marked difference in identifying true or false facts: newer LLMs scored 91.1 percent and 91.5 percent accuracy, respectively. Older LLMs were 84.8 percent and 71.5 percent accurate, respectively.

The authors also said that, despite some improvements, LLMs struggle to get to grips with the nature of knowledge. They “rely on inconsistent reasoning strategies, suggesting superficial pattern matching rather than robust epistemic understanding”, the paper said.

[5]

These limitations mean that LLMs need to improve before they are employed in “high-stakes domains” such as medicine, science or law.

[6]AI chatbots that butter you up make you worse at conflict, study finds

[7]Bots are overwhelming websites with their hunger for AI data

[8]Claude Code's copious coddling confounds cross customers

[9]OpenAI pulls plug on ChatGPT smarmbot that praised user for ditching psychiatric meds

“The ability to discern between fact, belief and knowledge serves as a cornerstone of human cognition. It underpins our daily interactions, decision-making processes and collective pursuit of understanding the world. When someone says, ‘I believe it will rain tomorrow’, we intuitively grasp the uncertainty inherent in their statement. Conversely, ‘I know the Earth orbits the Sun’ carries the weight of established fact. This nuanced comprehension of epistemic language is crucial across various domains, from healthcare and law to journalism and politics,” the paper said.

Gartner [10]has forecast global spending on AI will reach nearly $1.5 trillion in 2025, including $268 billion on optimized servers. "It's going to be in every TV, it's going to be in every phone. It's going to be in your car, in your toaster, and in every streaming service,” predicted John-David Lovelock, distinguished VP analyst.

The pace of the roll-out is apparently unhindered by some of the shortcomings. For example, [11]a benchmark developed by academics found that LLM-based AI agents perform below par on standard CRM tests and fail to understand the need for customer confidentiality. ®

Get our [12]Tech Resources



[1] https://www.nature.com/articles/s42256-025-01113-8

[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aQk0CV3L8mit-q54wJi1SwAAARU&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aQk0CV3L8mit-q54wJi1SwAAARU&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aQk0CV3L8mit-q54wJi1SwAAARU&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aQk0CV3L8mit-q54wJi1SwAAARU&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[6] https://www.theregister.com/2025/10/05/ai_models_flatter_users_worse_confilict/

[7] https://www.theregister.com/2025/06/17/bot_overwhelming_websites_report/

[8] https://www.theregister.com/2025/08/13/claude_codes_copious_coddling_confounds/

[9] https://www.theregister.com/2025/04/30/openai_pulls_plug_on_chatgpt/

[10] https://www.theregister.com/2025/09/17/gartner_ai_spending/

[11] https://www.theregister.com/2025/06/16/salesforce_llm_agents_benchmark/

[12] https://whitepapers.theregister.com/



Paul Herber

"... in your toaster ... "

Oh no!

Paul Crawford

Are you a waffle man?

Not Yb

"Hello, do you want some toast this morning?"

"No"

"Why do you not want toast this morning?"

"I just don't"

"Is it something I've done that you dislike?"

"OMG shut UP you stupid toaster!"

"I'm cooking some toast for you now."

...

etc.

Nightkiller

The problem is noy the LLMs. The problem is the moist robot prompt generators that cannot themselves distinguish between fact and opinion.

Paul Crawford

moist robot prompt generators

Sounds like a fun trip to Westworld, or maybe Stepford...

Anonymous Coward

Duh. Since much of the stuff on the internet is opinion not fact, how is this surprising? How would an LLM know the difference?

Like people then?

codejunky

This being why having the freedom to speak your beliefs is important and allow them to be challenged openly through discussion. Hell who would dictate the official truth? See this recent exposure-

https://www.telegraph.co.uk/news/2025/11/03/bbc-report-reveals-bias-donald-trump/

Slightly longer- https://www.youtube.com/watch?v=EjPlfUt4S9U

People have various beliefs over covid, MMCC co2 theory, green energy and wars. LLM is digging through more of the same varied beliefs people encounter every day

"AI researchers realize AI's are just pattern matching (again)" Film at 11?

Not Yb

I get the feeling that AI researchers may not know much about what they're researching. LLMs ARE pattern matchers, ad infinitum. That's all they do. There's not any sort of "this probability path is less factual" inside.

Yeah, but...

Tron

...lots of people believe that creation and the existence of heaven are absolute facts.

The whole misinformation thing falls over in many communities, who will need a Christian, Muslim or MAGA LLM.

I'm surprised that Alexa hasn't been charged with blasphemy in some countries, with tech execs being hung from trees for offending the righteous locals.

Re: Yeah, but...

45RPM

Or that Brexit was a good idea, or that Trump and Farage are anything other than grubby little con men. Some people think the moon landings were faked, that vaccination is a bad thing, that climate change isn’t real and that immigrants are bigger drain on resources than billionaire spongers who won’t pay their taxes.

On the face of this evidence, AI has reached human level intelligence. Sadly, it seems, many humans aren’t very intelligent.

Re: Yeah, but...

Philo T Farnsworth

> Sadly, it seems, many humans aren’t very intelligent.

But, meanwhile, you and I . . .

Re: Yeah, but...

45RPM

Oh, I make no claims about my own intelligence - in general terms, at least. I have expertise in a few specific areas. But I work with some truly astonishing minds. All of them organic. None of them in favour of denigrating experts and scientists (although… given that they are experts and scientists… wait a minute!)

Re: Yeah, but...

Denarius

Still believe in socialism but not Santa Claus.

Somebody believes the "Intelligence" part?

ThatOne

> Large language models often fail to distinguish between factual knowledge and personal belief

Even humans struggle with this, so why would a pattern-matching machine be able to distinguish between fact and fiction? It would need to understand the difference between those two to start with, which is technically impossible: For the machine everything is just a pattern with a reward value: "If I say this my handler will be happy, if I say that I'll get scolded". The notions of factuality and reality are light years away...

MrAptronym

I know it is important to publish research on things that seem likely, and I am glad this paper was produced, but this is surely not coming as a surprise, right? These predictive text models were trained on the totality of the text on the internet, a place where anyone can confidently state their opinions and beliefs as fact.

I am not an expert in this field, but I am kind of concerned at how anthropomorphized all this research comes off. These researchers and so many others read more like anthropologists or psychologists than anyone studying a computer system. Just in the abstract we see phrases like "We also find that, while recent models show competence in recursive knowledge tasks, they still rely on inconsistent reasoning strategies, suggesting superficial pattern matching rather than robust epistemic understanding." Which comes off to me as a wild way of saying that these models are not actually capable of understanding... a thing we know?

"how anthropomorphized all this research comes off"

Paul Kinsler

Indeed. But perhaps since these models are touted as "AI", it is of some interest exactly to treat them as if they were, and see what an analysis reveals.

And possibly, with an anthro/psych framing like this, results might be more easily taken on board by those who either believe the "AI" claim, or at least who are inclined to treat their interactions with them in that way -- because that is how things *seem* to them.

Dan 55

You and I know that LLMs understand and reason as much as a mobile phone keyboard autocorrect, but I suppose research about gullible bots is necessary to try and persuade gullible people.

QOTD:
"You're so dumb you don't even have wisdom teeth."