Seeing is believing in biomedicine, which isn't great when AI gets it wrong

(2025/07/27)

Reference: 1753615870
News link: https://www.theregister.co.uk/2025/07/27/biomedviz_ai_wrong_problems/
Source link:

Biomedical visualization specialists haven't come to terms with how or whether to use generative AI tools when creating images for health and science applications. But there's an urgent need to develop guidelines and best practices because incorrect illustrations of anatomy and related subject matter could cause harm in clinical settings or as online misinformation.

Researchers from the University of Bergen in Norway, the University of Toronto in Canada, and Harvard University in the US make that point in [1]a paper titled , "'It looks sexy but it's wrong.' Tensions in creativity and accuracy using GenAI for biomedical visualization," scheduled to be presented at IEEE's Vis 2025 conference in November.

In their paper, authors Roxanne Ziman, Shehryar Saharan, Gaël McGill, and Laura Garrison present various illustrations created by OpenAI's GPT-4o or DALL-E 3 alongside versions created by visualization experts.

[2]

Screenshot from paper.

Top row: Incorrect GPT-4o or DALL-E 3 images; Bottom row: images created by BioVisMed illustrators - Click to enlarge

Some of the examples cited diverge from reality in subtle ways. Others, like "the [3]infamously well-endowed rat " in a now-retracted article published in Frontiers in Cell and Developmental Biology, would be difficult to mistake for anything but fantasy.

Either way, imagery created by generative AI may look nice but isn't necessarily accurate, the authors say.

[4]

"In light of [5]GPT-4o Image Generation’s public release at the time of this writing, visuals produced by GenAI often look polished and professional enough to be mistaken for reliable sources of information," the authors state in their paper.

[6]

[7]

"This illusion of accuracy can lead people to make important decisions based on fundamentally flawed representations, from a patient without such knowledge or training inundated with seemingly accurate AI-generated 'slop,' to an experienced clinician who makes consequential decisions about human life based on visuals or code generated by a model that cannot guarantee 100 percent accuracy."

Show me a pancreas, and MidJourney is like, here is your pile of alien eggs!

Co-author Ziman, a PhD fellow in visualization research at the University of Bergen, told The Register in an email, "While I’ve not yet come across real-world examples where AI-generated images have directly resulted in harmful health-related outcomes, one interview participant shared with us [8]this case involving an AI-based risk-scoring system to detect fraud and wrongfully accused (primarily foreign parents) of childcare benefits fraud in the Netherlands.

"With AI-generated images, the more pervasive issue is the use of inaccurate imagery in medical and health-related publications, and scientific research publications broadly. While the potential harm isn’t immediately apparent, the increased use of inaccurate images [9]like this , and problems like [10]reinforcing stereotypes in healthcare , to communicate health and medical information is troubling."

Ziman said that the larger problem, echoed in a series of interviews discussed in the paper, is the way inaccurate imagery affects how the public sees scientific research. She pointed at the "well-endowed rat" and how it was [11]featured on The Late Show with Stephen Colbert.

[12]

"Satirical criticism by such public figures (that people may tend to trust more than ‘legitimate’ news sources) can throw into question the legitimacy of the scientific research community at large, and the public can come to distrust (even more) or not take seriously what they hear coming out of the scientific research community," said Ziman.

"Think of the consequences then for public health communications as during COVID, vaccine campaigns, etc. And bad actors now have greater ease of quickly creating and sharing misleading but convincing-looking imagery."

Ziman said while AI-generated medical images often get shared in the biomedical visualization (BioMedVis) community for a laugh and criticism, practitioners have yet to figure out how to mitigate the risks.

[13]Caught a vibe that this coding trend might cause problems

[14]As AI becomes more popular, concerns grow over its effect on mental health

[15]ServiceNow eyes $100M in AI-powered headcount savings

[16]Microsoft CEO feels weighed down by job cuts

Toward that end, the authors surveyed 17 BioMedVis professionals to assess how they see generative AI tools and how they use those tools in their work. The survey respondents, referred to by pseudonyms in the paper, reported a wide range of views about generative AI. The authors grouped them into five personas: Enthusiastic Adopters, Curious Adapters, Curious Optimists, Cautious Optimists, and Skeptical Avoiders.

Some of the respondents appreciated the abstract and otherworldly aesthetics of images generated by AI models, saying the images helped advance conversations with clients. Others (about half) were critical of GenAI style, agreeing with "Frank," who said the generic look in those images is boring.

Irrelevant or hallucinated references remain a problem, as do invented new terms, such as the 'green glowing protein.'

The survey takers also sometimes used text-to-text models for captions and descriptive assistance, though not always to the satisfaction of respondents. As the paper observes, "Irrelevant or hallucinated references remain a problem, as do invented new terms, such as the 'green glowing protein.'"

Some of the survey respondents see generative AI as useful for handling rote coding tasks, like generating boilerplate code or cleaning data. Others, however, feel they've already invested time learning to code and prefer to use those skills rather than delegate them.

[17]

The researchers also note a contradictory attitude among respondents, who "express grave concerns about intellectual property violations that are, for the moment, baked into public GenAI tools" when using them for commercial purposes, while also largely accepting use of generative AI on a personal basis.

Even though 13 of the 17 surveyed already incorporate GenAI into their production workflows to some degree, BioMedVis developers and designers still prioritize accuracy in their images, and "GenAI in its current state is unable to achieve this benchmark," the authors observe.

They point to remarks attributed to "Arthur": "While it’s still scraping the digital world for references it can use to generate art, it’s not yet able to know the difference between the sciatic nerve and the ulnar nerve. It’s just, you know, wires."

They also quote "Ursula" about GenAI's inability to produce accurate anatomy: “Show me a pancreas, and MidJourney is like, here is your pile of alien eggs!”

The paper says while erroneous AI output may be obvious in many cases, BioMedVis folk expect these errors to become harder to detect as technology improves and as people become accustomed to trusting these systems.

The researchers also raise more general concerns about the blackbox nature of machine learning and the difficulty of addressing bias.

The paper explains, "Inaccurate or unreliable outputs, whether the anatomical visuals …or blocks of code, can mislead and diffuse responsibility. Participants questioned who should be held accountable in instances where GenAI is used and lines of accountability blur." Blackbox models prevent that sort of accountability. As survey respondent "Kim" said, "There should be someone who can explain the results. It is about trust, and [...] about competence."

As a community, we should feel comfortable sharing our thoughts, questions, and concerns about these tools

Co-author Shehryar Saharan from the University of Toronto told The Register in an email that he hopes this research encourages people to think critically about how generative AI fits into the work and values of BioMedVis professionals.

"These tools are becoming a bigger part of our field, and it's important that we don’t just use them, but critically reflect on what they mean for how we work and why we do what we do," said Saharan.

"As a community, we should feel comfortable sharing our thoughts, questions, and concerns about these tools. Without open conversation and a willingness to reflect, we risk falling behind or using these technologies in ways that don’t align with what we actually care about. It’s about making space to think and reflect before we move forward." ®

Get our [18]Tech Resources

[1] https://arxiv.org/abs/2507.14494

[2] https://regmedia.co.uk/2025/07/25/ai_v_real_illustrations.jpg

[3] https://www.telegraph.co.uk/news/2024/02/16/journal-published-graphic-rat-with-giant-penis-asking-ai/

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aIZNJgjFu5hWFzbG10lXDwAAABE&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[5] https://openai.com/index/introducing-4o-image-generation/

[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aIZNJgjFu5hWFzbG10lXDwAAABE&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[7] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aIZNJgjFu5hWFzbG10lXDwAAABE&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[8] https://taxadmin.ai/taxadminai-toeslagenaffaire/

[9] https://www.linkedin.com/feed/update/activity:7342885402858405889/

[10] https://pmc.ncbi.nlm.nih.gov/articles/PMC11976010/

[11] https://youtu.be/Bj8IAoTnyNw?feature=shared&t=96

[12] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aIZNJgjFu5hWFzbG10lXDwAAABE&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[13] https://www.theregister.com/2025/07/25/opinion_column_vibe_coding/

[14] https://www.theregister.com/2025/07/25/is_ai_contributing_to_mental/

[15] https://www.theregister.com/2025/07/25/servicenow_100m_ai_savings/

[16] https://www.theregister.com/2025/07/25/microsoft_ceo_job_cuts/

[17] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aIZNJgjFu5hWFzbG10lXDwAAABE&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[18] https://whitepapers.theregister.com/

It's worse when AI Slop pretends to have medical knowledge

Anonymous Coward

It's one thing if AI makes up capital cities of countries or messes up basic arithmetic, it's quite another when it creates chaos in the medical space.

[1]"You should eat at least one small rock per day as rocks are a vital source of minerals and vitamins”

[2]AI doesn't know 'no' – and that's a huge problem for medical bots

[3]FDA’s artificial intelligence is supposed to revolutionize drug approvals. It’s making up studies

[1] https://theconversation.com/eat-a-rock-a-day-put-glue-on-your-pizza-how-googles-ai-is-losing-touch-with-reality-230953

[2] https://www.newscientist.com/article/2480579-ai-doesnt-know-no-and-thats-a-huge-problem-for-medical-bots/

[3] https://edition.cnn.com/2025/07/23/politics/fda-ai-elsa-drug-regulation-makary

Re: It's worse when AI Slop pretends to have medical knowledge

Version 1.0

I've worked fixing humans with a few problems in the technical medical world for about 40 years now. I don't see AI as making very different decisions to peoples' decisions, but when people make a decision in virtually all technical worlds then the human experts review their decision and verify that it's accurate and correct, not just assuming that they are correct - but AI seems to normally just assume that it's correct. When people do that then we see it as a frequent issue ... for example being told "Take these pills to make you feel better" and an expert saying the next day in hospital, "Oh he took 20 pills, but you only need one to fix the problem!"

AI errors often match people errors.

A fundamental problem...

CorwinX

... with AI is that the results are always presented authoratively, confidently and often convincingly.

Even when being 100% wrong or sometimes even making things up and actively lying.

When a human is asked a question we hopefully have the self awareness to sometimes say "don't know", "not sure, sorry".

But these things don't have any sense of self-doubt or sense of their own limitations and are programmed to spew out stuff anyway.

There are things AI can sometimes do well at - especially analysing images and spotting something that may have been missed - but nothing they say should be taken as gospel without verification.

But on a more positive note - the likes of Google Gemini using voice input is not half bad for simple things. As long as you verify.

Re: A fundamental problem...

Anonymous Coward

The thing with Large Language Models is that they are not Liars, they are Bullshitters. Liars know the truth but usually have a plan that requires lying, Bullshitters don't care about the truth, all they care about is whether they come across as convincing.

I suspect part of this is because of technical reasons, algorithmic limitations for instance. A big part will also be for commercial reasons. LLMs are convincing about a subject that the human talking to them knows little about. That's where confidence comes in. It helps to convince investors that LLM are actually intelligent if they are designed to come across as very confident. Doubt doesn't sell to the gullible.

Tower of babble

Rol

In the same way, the tower, if constructed in ancient times, would have sucked up every resource and brought on the demise of humanity, we are hurtling toward that now.

How much time will the great minds of our age waste flicking through the pages of this new age Tower of Babel to eventually go mad and throw themselves into the abyss.

Three points.

Tron

Does anyone trace the sources of these inaccuracies? A new field is born! AI forensics.

quote: This illusion of accuracy can lead people to make important decisions based on fundamentally flawed representations.

Just like in politics. They wear suits, they pretend to know what they are doing, they make promises, you vote for them, you get screwed.

Rats with big dicks? Reminded me of the noble Japanese artform of shunga (erotic books), which always gave the blokes supersized willies. Generations of Japanese women may have suffered from the offline harm of acute disappointment in consequence. Random example: https://www.maynardsfineart.com/auction-lot/kiokawa-shozan-japanese-1821-1907-a-pair-of_AE61080E76

online misinformation

Handlebars

Picture generators lower the bar for disseminating visual bs, but drawing and Photoshop skills were not that rare anyway.

Not really AI

Frank Zuiderduin

The Dutch 'toeslagenaffaire', referenced in a link in the article, wasn't really caused by an "AI-based risk-scoring system". It's true a risk-scoring system was involved, but that could hardly be called 'AI'. It was a set of manually created/entered rules. If you call that AI, then every computer system ever invented deserves that qualification.

News: 1753615870