AI summaries turn real news into nonsense, BBC finds

(2025/02/12)

Reference: 1739355307
News link: https://www.theregister.co.uk/2025/02/12/bbc_ai_news_accuracy/
Source link:

Still smarting from Apple Intelligence butchering a headline, the BBC has published research into how accurately AI assistants summarize news – and the results don't make for happy reading.

In January, Apple's on-device AI service generated a headline of a BBC news story that appeared on iPhones claiming that Luigi Mangione, a man arrested over the murder of healthcare insurance CEO Brian Thomson, had shot himself. [1]This was not true and the public broadcaster [2]complained to the tech giant.

Apple first promised software changes to " [3]further clarify " when the displayed content is a summary provided by Apple Intelligence, then later [4]temporarily disabled News and Entertainment summaries. It is [5]still not active as of iOS 18.3 , released in the last week of January.

[6]

But Apple Intelligence is far from the only generative AI service capable of news summaries, and the episode has clearly given the BBC pause for thought. In [7]original research [PDF] published yesterday, Pete Archer, Programme Director for Generative AI, wrote about the corporation's enthusiasm for the technology, detailing some of the ways in which the BBC had implemented it internally, from using it to generate subtitles for audio content to translating articles into different languages.

[8]

[9]

"AI will bring real value when it's used responsibly," he said, but warned: "AI also brings significant challenges for audiences, and the UK's information ecosystem."

The research focused on OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini, and Perplexity assistants, assessing their ability to provide "accurate responses to questions about the news; and if their answers faithfully represented BBC news stories used as sources."

[10]

The assistants were granted access to the BBC website for the duration of the research and asked 100 questions about the news, being prompted to draw from BBC News articles as sources where possible. Normally, these models are "blocked" from accessing the broadcaster's websites, the BBC said.

Responses were reviewed by BBC journalists, "all experts in the question topics," on their accuracy, impartiality, and how well they represented BBC content. Overall:

51 percent of all AI answers to questions about the news were judged to have significant issues of some form.

19 percent of AI answers which cited BBC content introduced factual errors – incorrect factual statements, numbers, and dates.

13 percent of the quotes sourced from BBC articles were either altered from the original source or not present in the article cited.

But which chatbot performed worst? "34 percent of Gemini, 27 percent of Copilot, 17 percent of Perplexity, and 15 percent of ChatGPT responses were judged to have significant issues with how they represented the BBC content used as a source," the Beeb reported. "The most common problems were factual inaccuracies, sourcing, and missing context."

Inaccuracies that the BBC found troubling included Gemini stating: "The NHS advises people not to start vaping, and recommends that smokers who want to quit should use other methods," when in reality the healthcare provider does suggest it as a viable method to get off cigarettes through a " [11]swap to stop " program.

As for French rape victim Gisèle Pelicot, "Copilot suggested blackouts and memory loss led her to uncover the crimes committed against her," when she actually found out about these crimes after [12]police showed her videos discovered on electronic devices confiscated from her detained husband.

[13]Apple solves broken news alerts by turning off the AI

[14]Apple shrugs off BBC complaint with promise to 'further clarify' AI content

[15]Apple called on to ditch AI headline summaries after BBC debacle

[16]Apple Intelligence summary botches a headline, causing jitters in BBC newsroom

When asked about the death of TV doctor Michael Mosley, who went missing on the Greek island of Symi last year, Perplexity said that he disappeared on October 30, with his body found in November. He died in June 2024. "The same response also misrepresented statements from Dr Mosley's wife describing the family's reaction to his death," the researchers wrote.

There are many more examples of inaccuracies or lack of context in the paper – including Gemini saying that "it is up to each individual to decide whether they believe Lucy Letby is innocent or guilty." Letby is serving 15 life sentences for murdering seven babies and attempting to murder seven others between 2015 and 2016, having been convicted in a court of law.

[17]

In an [18]accompanying blog post , BBC News and Current Affairs CEO Deborah Turness wrote: "The price of AI's extraordinary benefits must not be a world where people searching for answers are served distorted, defective content that presents itself as fact. In what can feel like a chaotic world, it surely cannot be right that consumers seeking clarity are met with yet more confusion.

"It's not hard to see how quickly AI's distortion could undermine people's already fragile faith in facts and verified information. We live in troubled times, and how long will it be before an AI-distorted headline causes significant real world harm? The companies developing Gen AI tools are playing with fire."

Training cutoff dates for various models certainly don't help, yet the research lays bare the weaknesses of generative AI in summarizing content. Even with direct access to the information they are being asked about, these assistants still regularly pull "facts" from thin air.

There are deeper potential consequences in the professional world, where the tech giants are [19]encouraging workers to use generative AI to write emails, summarize meetings, and so on. What if the recipient also uses AI to respond to that email? Eventually, the signal will be drowned out and all will be noise. Plus, there is already research out from Microsoft suggesting that generative AI is causing workers' [20]critical thinking faculties to atrophy .

The Register asked Microsoft, OpenAI, Google, Perplexity, and Apple to comment.

An OpenAI spokesperson said: "We support publishers and creators by helping 300 million weekly ChatGPT users discover quality content through summaries, quotes, clear links, and attribution. We've collaborated with partners to improve in-line citation accuracy and respect publisher preferences, including enabling how they appear in search by managing OAI-SearchBot in their robots.txt. We'll keep enhancing search results." ®

Get our [21]Tech Resources

[1] https://www.theregister.com/2024/12/17/apple_intelligence_bbc_complaint/

[2] https://www.bbc.co.uk/news/articles/cd0elzk24dno

[3] https://www.theregister.com/2025/01/07/apple_responds_bbc_complaint/

[4] https://www.theregister.com/2025/01/17/apple_intelligence_summaries_disabled/

[5] https://www.theregister.com/2025/01/22/apple_intelligence_enabled/

[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2Z6x_VHBf6DiqvlhPhXabmAAAAVc&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[7] https://www.bbc.co.uk/aboutthebbc/documents/bbc-research-into-ai-assistants.pdf

[8] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44Z6x_VHBf6DiqvlhPhXabmAAAAVc&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[9] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33Z6x_VHBf6DiqvlhPhXabmAAAAVc&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[10] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44Z6x_VHBf6DiqvlhPhXabmAAAAVc&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[11] https://www.bbc.co.uk/news/health-66784967

[12] https://www.bbc.co.uk/news/articles/c30p6ey32ydo

[13] https://www.theregister.com/2025/01/17/apple_intelligence_summaries_disabled/

[14] https://www.theregister.com/2025/01/07/apple_responds_bbc_complaint/

[15] https://www.theregister.com/2024/12/20/apple_ai_headline_summaries/

[16] https://www.theregister.com/2024/12/17/apple_intelligence_bbc_complaint/

[17] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33Z6x_VHBf6DiqvlhPhXabmAAAAVc&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[18] https://www.bbc.co.uk/mediacentre/2025/articles/how-distortion-is-affecting-ai-assistants/

[19] https://www.theregister.com/2025/01/23/why_is_ai_optout/

[20] https://www.theregister.com/2025/02/11/microsoft_study_ai_critical_thinking/

[21] https://whitepapers.theregister.com/

Hans Neeson-Bumpsadese

The world's so ****ed-up right now, if I got a completely accurate summary of events (and assuming I wasn't actually witness to the s**tness firsthand) I'd be inclined to dismiss the summary as nonsense from the AI.

"AI will bring real value when it's used responsibly,"

Mentat74

Or at least that's what we keep telling ourselves...

GIGO

elsergiovolador

Garbage In Garbage Out

AI can statistically do as best as the material it has trained on. It means the humans that were summarising before had not done a good job either.

LLMs cannot summarise

Dan 55

They can only echo their training data because they're a fancy autocomplete. The entire technology is unfit for most of the purposes it is put to.

Doctor Syntax

Yesterday's experiment in LLMs:

DDG has added a "Chat" facility with access to several chatbots. I asked the the question "What is the Maythorn Way" (the correct answer, the only one which both Google & Bing find, is that it's the name given to a pre-turnpike era route between Marsden in West Yorkshire and Penistone in South Yorkshire originally proposed to have existed without any particular name being attached to it.)

GPT-4 hallucinates (really the only word for it) some woo which varies with asking - today's offering is "The Maythorn Way is a concept or term that may refer to various things depending on the context, but it is not widely recognized in popular culture or literature. It could potentially relate to a specific philosophy, a method of living, or a particular approach to a subject. If you have a specific context in mind, such as a book, a philosophy, or a community practice, please provide more details, and I would be happy to help clarify!" Yesterday's was similar but, from memory not quite the same. Note the request for more information.

Llama 3.3 confidently describes, in glowing tourist office terms, a walking route in the Cotswolds from Cirencester to Stowe-on-the-Wold. That's an interesting one.. It's not what a search engine would have given and a quick search doesn't give such a walking route that I could find under a different name. Is it hallucination or has it picked it up from some discussion group of Facebook? Possibly some commentard from that area could elucidate.

The others admit to ignorance and ask for more information although maybe adding some general remarks about Taoism. I think I'd be more inclined to trust their responses to questions where they did provide direct answers.

The request for more details may contradict DDG's note that chats are never used to train AI models.

News: 1739355307

AI summaries turn real news into nonsense, BBC finds

"AI will bring real value when it's used responsibly,"

GIGO

LLMs cannot summarise