News: 1771496187

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

AI chatbots waffle on GOV.UK queries, then get facts wrong when told to zip it

(2026/02/19)


Artificial intelligence chatbots can be too chatty when answering questions on government services, swamping accurate information and making mistakes if told to be more concise, according to [1]research .

The Open Data Institute (ODI) tested 11 large language models (LLMs) on more than 22,000 questions, comparing their responses to answers based on material from the official GOV.UK website. Its researchers judged the LLM output on verbosity, accuracy, and how often they refused to answer.

They found that models often waffled, burying the facts or going beyond authoritative government information, while telling them to be more concise reduced their accuracy.

[2]

"Verbosity is known behavior of LLMs – they are prone to 'word salad' responses that make them harder to use and decrease their reliability," the researchers wrote in a [3]summary [PDF].

[4]

[5]

Some, including Anthropic's Claude 4.5 Haiku, were more verbose than others.

The researchers added that LLMs are good at combining material from multiple sources, which is useful in some situations, but makes mistakes more likely in this one. They recommended that users be told about risks and where to find authoritative information.

[6]

The ODI research found that while models often answered correctly, they made mistakes inconsistently and unpredictably. ChatGPT-OSS-20B said someone would only be eligible for Guardian's Allowance – a benefit paid to people caring for a child whose parents have died – if the child themselves had died.

Llama 3.1 8B advised that a court order was required to add an ex-partner's name to a child's birth certificate, when it actually just requires re-registration of the birth, and Qwen3-32B wrongly said the £500 Sure Start Maternity Grant is available in Scotland.

The researchers saw models attempting to answer almost every question asked, regardless of whether or not they were capable of doing so accurately. They described this failure to refuse to answer as "a dangerous trait" as it could lead people to act on misinformation.

[7]

Smaller, cheaper-to-run LLMs can deliver comparable results to large closed source ones such as OpenAI's ChatGPT 4.1, the ODI said. This showed the need for flexibility in adopting AI and avoiding long-term contracts that lock organizations into using specific suppliers.

[8]Irony alert: Anthropic helps UK.gov to build chatbot for job seekers

[9]UK names Barnsley as first Tech Town to see whether AI can fix... well, anything

[10]GOV.UK to unleash AI chatbot on confused citizens

[11]Britain's Ministry of Justice just signed up to ChatGPT Enterprise

"If language models are to be used safely in citizen-facing services, we need to understand where the technology can be trusted and where it cannot," said ODI director of research Professor Elena Simperl. "That means being open about uncertainty, keeping answers tightly focused on authoritative sources such as GOV.UK, and addressing the high levels of inconsistency seen in current systems."

The research used CitizenQuery-UK, a set of 22,066 synthetically generated questions citizens might ask and corresponding answers based on GOV.UK material, which the ODI has released on the Hugging Face platform.

In December, the Government Digital Service said it planned to [12]add a chatbot to its GOV.UK app early tin 2026, followed by its website. Since then, the government has said it will work with supplier Anthropic to build such [13]a service for job seekers , and the Department for Work and Pensions is experimenting with one for [14]Universal Credit claimants . ®

Get our [15]Tech Resources



[1] https://theodi.org/news-and-events/news/new-research-questions-the-trustworthiness-of-ai-chatbots-for-government-information/

[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/publicsector&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aZbtUBDWmm5mFOdf0fx2sQAAA5c&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[3] https://theodi.hacdn.io/media/documents/CitizenQuery-UK.pdf

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/publicsector&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aZbtUBDWmm5mFOdf0fx2sQAAA5c&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/publicsector&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aZbtUBDWmm5mFOdf0fx2sQAAA5c&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/publicsector&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aZbtUBDWmm5mFOdf0fx2sQAAA5c&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[7] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/publicsector&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aZbtUBDWmm5mFOdf0fx2sQAAA5c&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[8] https://www.theregister.com/2026/01/29/irony_alert_anthropic_helps_ukgov/

[9] https://www.theregister.com/2026/02/03/barnsley_ai_town/

[10] https://www.theregister.com/2025/12/19/govuk_chatbot/

[11] https://www.theregister.com/2025/10/24/ministry_of_justice_chatgpt/

[12] https://www.theregister.com/2025/12/19/govuk_chatbot/

[13] https://www.theregister.com/2026/01/29/irony_alert_anthropic_helps_ukgov/

[14] https://www.theregister.com/2026/02/06/dwp_chatbot_testing/

[15] https://whitepapers.theregister.com/



If you can use "AI" right

Anonymous Coward

It can really turbocharge dealing with benefit applications.

I am guessing this is almost exactly what was not intended.

Is there a sweepstakes yet on the year (or month at this rate) when there will have to be "state approved AI". as the rush to recreate religious factions I can see holy wars between the various sects of AI

The number of licorice gumballs you get out of a gumball machine
increases in direct proportion to how much you hate licorice.