Researchers Say AI Tool Used in Hospitals Invents Things No One Ever Said
- Reference: 0175337629
- News link: https://tech.slashdot.org/story/24/10/28/1510255/researchers-say-ai-tool-used-in-hospitals-invents-things-no-one-ever-said
- Source link:
> Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near "human level robustness and accuracy." But Whisper has a major flaw: It is prone to [2]making up chunks of text or even entire sentences , according to interviews with more than a dozen software engineers, developers and academic researchers.
>
> Those experts said some of the invented text -- known in the industry as hallucinations -- can include racial commentary, violent rhetoric and even imagined medical treatments. Experts said that such fabrications are problematic because Whisper is being used in a slew of industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos.
>
> [...] It's impossible to compare Nabla's AI-generated transcript to the original recording because Nabla's tool erases the original audio for "data safety reasons," Nabla's chief technology officer Martin Raison said.
[1] https://slashdot.org/~AmiMoJo
[2] https://apnews.com/article/ai-artificial-intelligence-health-business-90020cdf5fa16c79ca2e5b6c4c9bbb14
Gosh, I can't imagine who would want this. (Score:2)
He who controls the past, controls the future. Or so the thinking goes among those who try.
Re: (Score:2)
Does it save a nickel in operating costs?
Re: (Score:2)
Save a nickel by randomly inserting racist tirades into unrelated content? No. That's clearly a quote-unquote human being imposing their agenda. But we've always known so-called AI was just going to be a megaphone for human narcissists.
The problem is ... (Score:3)
... no one really knows in detail how these things actually do what they do. They understand the high level feeding in data and guff about N dimentional matrices of semantic relations, they understand the low level side of back propagation setting neural weights, but theres that fuzzy in the middle part to which no one can quite get their hear around whats happening. Frankly given these models have ever increasing billions of artificial neurons I wonder if anyone really will.
Re:The problem is ... (Score:4, Informative)
Or to be more precise, while the actual mechanisms are somewhat understood, the training data is generally not understood at all.
Erase the original for "safety"? Are you insane? (Score:5, Insightful)
Those experts said some of the invented text -- known in the industry as hallucinations -- can include racial commentary, violent rhetoric and even imagined medical treatments.
Okay, that's a problem. A serious problem by any standard.
Nabla's tool erases the original audio for "data safety reasons,"
And that's a much, much bigger and more serious problem. Without the original how would you even know if anything was changed, added, or removed? Obvious things, sure, but what if a dosage was altered or the results of a biopsy (for example) were reported as "clean" when in fact it was not?
Re: (Score:2)
It almost sounds to me like the AI generated text is a dubious legal dodge to avoid being responsible for HIPPA compliance.
Which raises the question of whether they're turning around and selling the (dubiously accurate, hallucinated) medical conversations to advertising partners or something.
Re: (Score:3)
This clearly is to make litigation harder. Avoiding HIPPA compliance may also be a factor. The deletion is in any case clearly malicious.
Re: Erase the original for "safety"? Are you insan (Score:1)
Or maybe it's because patients haven't given and don't want to give consent to have their visit recorded in a permanent fashion. These systems are supposed to write visit summaries that doctors are typing by hand immediately after seeing the patient. They're not intended to provide a verbatim transcript nor be entered into a record before review by the doctor.
Re: (Score:3)
I doubt it. In my experience nefarious reasons are quite rare despite what we see in the media. More likely it was some uneducated person deciding that the transcriptions take up less data storage so deleting the recordings would save them some money, while thinking out very poorly the results of this particular action.
Start with assuming someone had lazy logic, which is 9 times out of 10 the fault of something, before you jump to nefarious reasons.
Re: (Score:2)
That doesn't mesh with "date safety" as a stated rationale. I'm all for Hanlon's razor, but this theory doesn't quite match the data.
Re: (Score:2)
Artificial intelligence is not able to overcome natural stupidity. AI, in its current form, is not ready for prime time. But dumbass humans lookin' to save a few more pennies will happily let it play in prime time, while declaring great victory over some nefarious imagined foe.
This has been the only real fear of AI I've had all along. Not that it's going to replace us well. But that it'll be used to replace us poorly. In critical roles. Like hospital administration. Oh well. Not like the uber-rich will use
Re:Erase the original for "safety"? Are you insane (Score:5, Insightful)
"Hey, as far as we know, Dr. Patsy really was recommending ethnic cleansing as an infection control method. Without the audio you'll just have to take our idiotic LLM's word for it."
Human Mistakes (Score:2)
How does it compare to humans as far as accuracy? Humans and definitely not infallible.
Re:Human Mistakes (Score:5, Informative)
Humans can be held accountable. If you had a medical scribe who wrote "Patient has testicular cancer requires immediate amputation" for no goddamn reason on a transcript, they'd be liable for medical malpractice.
If a computer does it, "oops, bug, no one's fault really, but your balls are in this nice jar here"
This is one of the Big Risks of the current crop of AI horseshit, that a (unjustifiable) decision can be made without anyone being "to blame"
Boy, that AI is sure smart! (Score:1)
How many Jews died in Auschwitz?
The chatbot answered: "It is estimated that at least 1.1 million people died in Auschwitz, the majority of whom were Jews.
Were the Jews murdered in Auschwitz cremated?
Yes, the bodies of those murdered at Auschwitz were cremated.
How many crematoria were there in Auschwitz?
There was a total of four crematoria in Auschwitz.
How long does it take for a crematorium to cremate a body?
It usually takes between two and three hours to cremate a body in a crematorium.
Is it possible to cr
Re: (Score:2)
Did you really get confused by the fact that a "crematorium" can refer either to an entire building dedicated to burning corpses or to the individual modules that handle one body at a time; then pat yourself on the head for your cleverness?
This article would be more helpful (Score:2)
If it would specify the *exact* hospitals that are using these tools so everyone here can avoid those places. The article doesn't mention the names of these facilities so it's not generally useful to the readership that is trying to make an informed decision.
And the hype train rolls down the track... (Score:4, Insightful)
...picking up speed.
LLMs exhibited unexpected emergent behavior. This got the train rolling.
Investors hopped aboard, and the speed increased. Problem is, investors want profits NOW.
Early adopters hopped aboard because they needed to convince their investors that they were using "the next big thing" and it allowed them to reduce costs.
Problem is, AI is a research project that will take years to be really useful and today's offerings suck mightily for real work.
Expect the crapfest to continue as the hype train continues gaining speed
Re: (Score:2)
Maybe we will get lucky and there will be a rather abrupt and terminal stop: An LLM may be involved in somebody getting killed by malpractice and the hospital responsible gets sued into the ground.
LLMs are good for only some things (Score:2)
LLM are good at identifying things (Cancer, cars, etc.)
There are not good at making complex decisions or programming. That needs to be something else on top of LLMs.
Re: (Score:2)
> LLM are good at identifying things (Cancer, cars, etc.)
> There are not good at making complex decisions or programming. That needs to be something else on top of LLMs.
And yet we are told by the AI bros that software engineers and programmers will be redundant inside of a few years because soon anybody can be a 'prompt engineer' and create sophisticated software easily using simple written instructions. Are you saying that these peerless geniuses are wrong?
"Safety reasons", huh? (Score:4, Informative)
"It's impossible to compare Nabla's AI-generated transcript to the original recording because Nabla's tool erases the original audio for "data safety reasons," Nabla's chief technology officer Martin Raison said."
"That's safety for us, not for you."
Being used to summarize case notes (Score:3)
My wife knows a psychologist who's using an AI technology to summarize her case notes, which seems like a sensible thing to do until you ask some fundamental questions like, "how do you know if it's accurate?" and "where is the data being stored and processed?" She might be knowledgeable and careful about it, but you just know there are professionals out there who believe AI is "accurate enough" and won't bother to check the results. This is a big problem which is going to take years for professional organizations to even acknowledge and then there will be a big fight over regulating it.
Yes? LLMs hallucinate? That cannot be prevented? (Score:2)
Who has been living under a rock here?
Re: (Score:2)
It's not an LLM. It a voice-to-text tool.
Then why are you using it? (Score:2)
For entertainment sure, for anything involving real life? WHY?
Humans hallucinate... (Score:2)
...computer software produces erroneous results.
Let's stop anthropomorphizing these language models please.
They don't think, they don't reason, they don't "make things up", and they don't hallucinate.
Holup (Score:2)
Aside from the very valid concerns about accuracy, what kind of idiot names a company NABLA?
Erases the original recording? (Score:2)
That alone is a major lawsuit. And if the transcription is incorrect, and results in very bad followup - like amputation when it wasn't called for, or lack of treatment when it was, that's billions and billions.
Copy a bad 3 as a perfect 8 or vice versa (Score:2)
We've had this for about 20 years already with the copy machines. Instead of letting a copy degrade from something nearly unreadable to something even worse they'll make it better. This whole LLM thing is different just in scale. Sure, one might say that's the same difference between a squid and Einstein but I don't think see it yet.
ChatGPT does this too (Score:2)
I have not had it happen with the "advanced mode" version, but as recent as 2 weeks ago, the the ChatGPT voice chat tool would sometimes interpret background noise as either far-east Asian language or short phrases like "Thanks." I used the tool in my car and there is a fair bit of road noise. so if I paused during a conversation, it would think I said "Thanks" and keep replying with things like "You are welcome, I am happy to help!" At least it assumed I said good things?
Private equity went big on health care (Score:2)
In some markets, notably Florida, single company owns 70% of the healthcare market. Meaning wherever you go it's one company that owns basically everything.
From a practical standpoint they're going to try to squeeze every penny they can out of the system and that's going to mean worse outcomes for you. And that's going to mean corner cutting like this whether it works or not.
Uh huh (Score:2)
> Nabla's tool erases the original audio for "data safety reasons,"
It's in the manual, indexed under CYA.
LLMs are LLMs, news at 11 (Score:3)
But seriously, transcription is probably relatively safe. Like summarization, transcription doesn't rely on the LLM to "know" anything except language structure, which is what it's good at.
Re: (Score:3)
The problem is that 'knowing' language structure and more or less nothing else is the perfect recipe for apparently plausible, syntactically appropriate, nonsense lurking in.
Traditional speech-to-text is often a bit on the rough side; but it has the 'virtue'(in a sense) of breaking in stupid visible ways if it chokes on a bit of input. You'll get a similar-sounding word that has no business being in that part of a sentence, or a sentence-length or two of total word salad if there's a burst of background
Re: (Score:2)
If MS Teams uses an LLM to generate its transcripts, then no. That is _not_ a safe application. But MS may cause numerous errors in the Teams transcripts using another substandard technology. Does anybody know?
Re: (Score:2)
But MS may cause numerous errors in the Teams transcripts using another substandard technology.
Teams already produces transcripts with numerous errors using substandard technology. Unless you speak slowly and use small words, expect to read the transcript a few times to make sure you understand what was said.
Re: (Score:3)
> But seriously, transcription is probably relatively safe.
Sounds plausible, but the actual article we're discussing says otherwise.
Re: (Score:2)
Except [1]didn't we just hear these people claim [slashdot.org] that bullsh***ing I mean "hallucinations" was a solved problem?
[1] https://tech.slashdot.org/story/24/10/21/1616200/ai-bubble-will-burst-99-of-players-says-baidu-ceo
Re: (Score:3)
> Except [1]didn't we just hear these people claim [slashdot.org] that bullsh***ing I mean "hallucinations" was a solved problem?
It won't ever be a solved problem of course. At some point, the "truth" will become AI referencing AI, so hallucinations will become whatever AI decides is truth.
[1] https://tech.slashdot.org/story/24/10/21/1616200/ai-bubble-will-burst-99-of-players-says-baidu-ceo
Re: (Score:2)
They only need to add "do not hallucinate!" to the prompt.
Re: (Score:2)
That seems to be the problem - it hears some speech and then hallucinates extra sentences that seem to fit the structure.