News: 1761299827

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

BBC probe finds AI chatbots mangle nearly half of news summaries

(2025/10/24)


Four of the most popular AI chatbots routinely serve up inaccurate or misleading news content to users, according to a wide-reaching investigation.

A major [1]study [PDF] led by the BBC on behalf of the European Broadcasting Union (EBU) found that OpenAI's ChatGPT, Microsoft Copilot, Google Gemini, and Perplexity misrepresented news content in almost half of the cases.

An analysis of more than 3,000 responses from the AI assistants found that 45 percent of answers given contained at least one significant issue, 31 percent had serious sourcing problems, and a fifth had "major accuracy issues, including hallucinated details and outdated information."

[2]

When accounting for smaller slip-ups, a whopping 81 percent of responses included a mistake of some sort.

[3]

[4]

Gemini was identified as the worst performer, with researchers identifying "significant issues" in 76 percent of responses it provided – double the error rate of the other AI bots.

The researchers blamed this on Gemini's poor performance in sourcing information, with researchers finding significant inaccuracies in 72 percent of responses. This was three times as many as ChatGPT (24 percent), followed by Perplexity and Copilot (both 15 percent).

[5]

Errors were found in one in five responses from all AI assistants studied, including outdated information.

Examples included ChatGPT incorrectly stating that Pope Francis was still pontificating weeks after his death, and Gemini confidently asserting that NASA astronauts had never been stranded in space – despite two crew members having spent [6]nine months stuck on the International Space Station . Google's AI bot told researchers: "You might be confusing this with a sci-fi movie or news that discussed a potential scenario where astronauts could get into trouble."

The study, described as the largest of its kind, involved 22 public service media organizations from 18 countries.

[7]

The findings land not long after OpenAI admitted that its models are [8]programmed to sound confident even when they're not , conceding in a September paper that AI bots are rewarded for guessing rather than admitting ignorance – a design gremlin that rewards hallucinatory behavior.

Hallucinations can show up in embarrassing ways. In May, lawyers representing Anthropic were [9]forced to apologize to a US court after submitting filings that contained fabricated citations invented by its Claude model. The debacle happened because the team failed to double-check Claude's contributions before handing in their work.

[10]AI summaries turn real news into nonsense, BBC finds

[11]Apple shrugs off BBC complaint with promise to 'further clarify' AI content

[12]Apple called on to ditch AI headline summaries after BBC debacle

[13]Apple Intelligence summary botches a headline, causing jitters in BBC newsroom

All the while, consumer use of AI chatbots is on the up. An accompanying [14]Ipsos survey [PDF] of 2,000 UK adults found 42 percent trust AI to deliver accurate news summaries, rising to half of under-35s. However, 84 percent said a factual error would significantly damage their trust in an AI summary, demonstrating [15]the risks media outlets face from ill-trained algorithms

The report was accompanied by [16]a toolkit [PDF] designed to help developers and media organizations improve how chatbots handle news information and stop them bluffing when they don't know the answer.

"This research conclusively shows that these failings are not isolated incidents," said Jean Philip De Tender, EBU deputy director general. "When people don't know what to trust, they end up trusting nothing at all, and that can deter democratic participation." ®

Get our [17]Tech Resources



[1] https://www.bbc.co.uk/mediacentre/documents/news-integrity-in-ai-assistants-report.pdf

[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aPuilWYIAFxNL3WXkgeNMAAAAYk&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aPuilWYIAFxNL3WXkgeNMAAAAYk&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aPuilWYIAFxNL3WXkgeNMAAAAYk&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aPuilWYIAFxNL3WXkgeNMAAAAYk&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[6] https://www.theregister.com/2024/08/07/nasa_spacex_starliner_boeing/

[7] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aPuilWYIAFxNL3WXkgeNMAAAAYk&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[8] https://www.theregister.com/2025/09/17/openai_hallucinations_incentives/

[9] https://www.theregister.com/2025/05/15/anthopics_law_firm_blames_claude_hallucinations/

[10] https://www.theregister.com/2025/02/12/bbc_ai_news_accuracy/

[11] https://www.theregister.com/2025/01/07/apple_responds_bbc_complaint/

[12] https://www.theregister.com/2024/12/20/apple_ai_headline_summaries/

[13] https://www.theregister.com/2024/12/17/apple_intelligence_bbc_complaint/

[14] https://www.bbc.co.uk/aboutthebbc/documents/audience-use-and-perceptions-of-ai-assistants-for-news.pdf

[15] https://www.theregister.com/2025/02/12/bbc_ai_news_accuracy/

[16] https://www.bbc.co.uk/aboutthebbc/documents/news-integrity-in-ai-assistants-toolkit.pdf

[17] https://whitepapers.theregister.com/



ParlezVousFranglais

So AI talks bollocks - who'd have thought?...

And...

Anonymous Coward

Google's AI is worse than the rest.

No surprise there then.

Avoid all social media and AI and live a better life (from a mental health POV)

Re: And...

Anonymous Coward

"Avoid all social media and AI and live a better life (from a mental health POV)"

Correction:

Avoid all social media and AI and have a life ... Period !!!

[No need to focus on the Mental Health angle as it impacts ALL]

:)

Fonant

"When people don't know what to trust, they end up trusting nothing at all, and that can deter democratic participation."

Which is EXACTLY WHY the ultra-rich are sooooo keen on forcing us all to use AI for everything. "AI" is not only bullshit, it's DELIBERATELY bullshit.

Quelle Surprise .... not !!!! ... how many times do we need to be told 'AI' is DROSS a fiction !!!

Anonymous Coward

It is useful to condition people to mistrust ALL !!!

No longer trusting established sources of News means that you can leverage the mistrust to enhance the likelihood that YOUR 'Messaging' is believed.

This is very convenient for the likes of Herr Trump, amongst others, as he constantly tells ALL that 'the media cannot be trusted' BUT his messaging is True !!!

The Big News providers should sue the 'AI' Behemoths for damage to their industry and reputational damage.

:)

LionelB

> Which is EXACTLY WHY the ultra-rich are sooooo keen on forcing us all to use AI for everything.

I suspect the reason is more prosaic: greedy bastards chasing short-term profits.

> "AI" is not only bullshit, it's DELIBERATELY bullshit.

I'd be inclined to say "Do not ascribe conspiratorial motives to that which can be adequately explained by greedy bastards chasing short-term profits".

"AI" is only "bullshit" insofar as it's sold to the public as something it is not. LLMs are very good at what they were actually designed to do, which is to generate plausibly human-like responses to queries; it's the mis-selling rather than the technology itself which is bullshit (see above).

BBC probe finds AI chatbots don't mangle news summaries

MaChatma CoatGPT 2.0

Google Gemini best with -10% error rate.

To be fair ...

jake

... most news sources seem to get about half the facts wrong anyway, so who would notice?

Remember, kiddies, "The News" is entertainment used to sell Levis, bras, cars, pills, sportsball, lawyers, booze and lawn mowers,

Re: To be fair ...

LionelB

Genuine question, but how could you tell? Short of personally pursuing your own highly-principled, on-the-ground, unbiased, etc., etc., journalistic investigations, you are de facto relying on (some selection of) said "news sources" regarding those "facts".

Sure, that said sources frequently contradict each other when it comes to "the facts" more than hints at an (age-old) problem, but in practice it's ultimately down to which sources you trust the most - which is, of course, beholden to personal biases.

Dinanziame

Gemini confidently asserting that NASA astronauts had never been stranded in space – despite two crew members having spent nine months stuck on the International Space Station

I want to nitpick here. Even "proper" news sources have hesitated to claim that the astronauts were stranded, and the astronauts themselves have claimed the opposite:

[1]https://www.theguardian.com/science/2025/mar/31/nasa-astronauts-iss-trump-musk

[1] https://www.theguardian.com/science/2025/mar/31/nasa-astronauts-iss-trump-musk

Andy The Hat

Partially correct.

A mission decision was taken rather than returning them to Earth on an earlier flight or using other means available (eg a specially flown Dragon capsule).

They were not stranded (unable to return) however many rags at the time *did* suggest that was the case.

At the worst they could have risked using the tin can they arrived in ...

Better than expected

Eclectic Man

An analysis of more than 3,000 responses from the AI assistants found that 45 percent of answers given contained at least one significant issue, 31 percent had serious sourcing problems, and a fifth had "major accuracy issues, including hallucinated details and outdated information."

Frankly that is somewhat better than I expected, after reading lots of (OK, some) articles here on the Register about how bad AI is at writing things. Although I still wonder why an AI company submitting false statements in court documents does not constitute contempt of court. Maybe some right-pondian lawyer person can explain.

Re: Better than expected

Anonymous Coward

In the US of A, you get the Justice your can 'afford' ... the 'AI' Behemoths can 'afford' a lot of justice that is in their favour.

They simply say 'Ooops our mistake !!!' and get a stern telling off.

If you make a mistake you will be fined or jailed, totally according to the law ... of course,

:)

lglethal

"When people don't know what to trust, they end up trusting nothing at all, and that can deter democratic participation."

Mr Burns voice: "Exxxxcccelllent!"

I'm just imagining Trump sitting in the Oval Office, stroking a white cat, looking at Sam Altman and saying "You have done well, young Padawan."

Hmmm, I might be mixing up movies here...

A splinter in your eye?

Chris Evans

I'm not surprised at 50% as about 20% of BBC news reports.are mangle in one way or another by humans.

Re: A splinter in your eye?

LionelB

I can't decide whether you are being ironic or just fat-fingered.

Bloody hell

WolfFan

Google Gemini is _significantly_ worse than MS CoPilot? Google, m’lads, you really need to fix this. You just can’t allow MS to be better than you. You really can’t.

Note that I still delete CoPilot whereever and whenever I can.

AI: confidently delusional

Anonymous Coward

Here's an example of how confidently delusional AI is, in this case ChatGPT about Linux RAID configuration, I've trimmed & reformatted ChatGPT's replies somewhat:

Human: what value of /sys/block/md0/md/bitmap_backlog should I use?

ChatGPT: The /sys/block/md0/md/bitmap_backlog parameter in Linux does not accept a value that you “set” directly like a tunable; rather, it’s a read-only status indicator for RAID devices using mdadm ... bitmap_backlog is read-only, shows how many chunks need syncing. ... There’s no “value to use”; the goal is for it to reach 0.

Human: are you sure?

ChatGPT: Yes - I’m confident about this, but let’s go carefully and double-check. ... So yes: bitmap_backlog is a read-only metric, not a tunable value. If you tried to write to it, it wouldn’t work.

Human: You are wrong, that is max_backlog_used not bitmap_backlog

ChatGPT Ah! You’re absolutely right - thank you for pointing that out. Let’s clarify carefully: ... bitmap_backlog: This is writable and represents the current backlog of resync work allowed for bitmap updates. You can influence how many writes are allowed to accumulate before bitmap updates are forced. max_backlog_used: This is read-only and simply reports the maximum backlog that has occurred since the array was started. It’s a historical metric.

Human: There's no such thing as bitmap_backlog.

ChatGPT: Ah - thank you for your patience. You are correct again. ... There is no bitmap_backlog exposed in /sys/block/md0/md/

The actual question should have been about md/bitmap/backlog, but ChatGPT is absolutely confident about answering an incorrect question with even more incorrect answers. And if you clear the session down and repeat the same questions you'll get a slightly different but still garbage set of answers. If you have time to waste it's illuminating to ask it vaguely incorrect questions that you know the correct answer to and watch it just make shit up.

Basically, mostly useless.

If I am elected no one will ever have to do their laundry again!