ChatGPT-4 Beat Doctors at Diagnosing Illness, Study Finds (nytimes.com)

(Monday November 18, 2024 @04:04AM (EditorDavid) from the medical-machines dept.)

Reference: 0175490553
News link: https://science.slashdot.org/story/24/11/18/058208/chatgpt-4-beat-doctors-at-diagnosing-illness-study-finds
Source link: https://www.nytimes.com/2024/11/17/health/chatgpt-ai-doctors-diagnosis.html

Dr. Adam Rodman, a Boston-based internal medicine expert, helped design [1]a study testing 50 licensed physicians to see whether ChatGPT improved their diagnoses, [2]reports the New York TImes . The results? "Doctors who were given ChatGPT-4 along with conventional resources did only slightly better than doctors who did not have access to the bot.

"And, to the researchers' surprise, ChatGPT alone outperformed the doctors."

> [ChatGPT-4] scored an average of 90 percent when diagnosing a medical condition from a case report and explaining its reasoning. Doctors randomly assigned to use the chatbot got an average score of 76 percent. Those randomly assigned not to use it had an average score of 74 percent.

>

> The study showed more than just the chatbot's superior performance. It unveiled doctors' sometimes unwavering belief in a diagnosis they made, even when a chatbot potentially suggests a better one.

>

> And the study illustrated that while doctors are being exposed to the tools of artificial intelligence for their work, few know how to exploit the abilities of chatbots. As a result, they failed to take advantage of A.I. systems' ability to solve complex diagnostic problems and offer explanations for their diagnoses. A.I. systems should be "doctor extenders," Dr. Rodman said, offering valuable second opinions on diagnoses.

"The results were similar across subgroups of different training levels and experience with the chatbot," the study concludes. "These results suggest that access alone to LLMs will not improve overall physician diagnostic reasoning in practice.

"These findings are particularly relevant now that many health systems offer Health Insurance Portability and Accountability Act-compliant chatbots that physicians can use in clinical settings, often with no to minimal training on how to use these tools."

[1] https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2825395

[2] https://www.nytimes.com/2024/11/17/health/chatgpt-ai-doctors-diagnosis.html

Dunning-Kruger effect (Score:2)

by backslashdot ( 95548 )

So the AI was 90% accurate, but most of the time doctors didn't trust it so when ahead with their own incorrect diagnosis? One thing I want to know is how bad the 10% that the AI missed were .. like major blunders or what? Also, what about the 26% that the doctors missed .. how severe was the error? Anyone read the actual study? (Yes I know it's linked, but I'm an a slashdotter.)

Re:Dunning-Kruger effect (Score:4, Insightful)

by martin-boundary ( 547041 )

Accuracy alone means nothing, as usual.

In a binary classification task, there are two numbers that should be reported, false negatives and true positives, or alternatively recall and precision, or alternatively the confusion matrix, etc.

The point is that comparisons of classifiers (humans doctors or AI) are impossible on a linear scale, and anyone who reports results on a linear scale is biased. The math says so.

Re: (Score:2)

by 93 Escort Wagon ( 326346 )

Also, what percentage of the 10% were blazingly wrong bull**** answers, AKA "hallucinations"?

As in "70 YO male, lifetime smoker, presents with a persistent cough and severe shortness of breath" = "gangrenous foot, immediate amputation required to save patient"

Re: (Score:2)

by ShanghaiBill ( 739463 )

> Also, what percentage of the 10% were blazingly wrong bull**** answers, AKA "hallucinations"?

If they were, then that makes the human doctors look even worse.

If the incorrect ChatGPT diagnoses were reasonable, the doctors likely made the same errors, and got an additional 14% wrong.

But if the incorrect ChatGPT diagnoses were blazing wrong bull****, the doctors should've easily corrected them, and got an additional 24% wrong.

Re: (Score:3)

by Errol backfiring ( 1280012 )

Apart from this, doctors can be "intelligently wrong", by giving a diagnose which is not chiseled in stone and starting a treatment that would also help related illnesses. How often has your doctor said "call me when things get worse" as he has sent you home with a prescription?

Doctors do not want 100% accuracy, as the amount of work to get the last percents right is huge and they have other patients to treat. They want accuracy that is good enough.

Re: (Score:1)

by buck-yar ( 164658 )

Hanan Polansky says doctors are wrong about the underlying causes of most diseases. If you believe his theory, foreign DNA (viruses) are the root cause. Doctors are simply treating symptoms, which is why progress is not made. Blinded by profit, the medical industry has no desire to acknowledge viruses as the cause, as they have no cure. It would be a foundational shift in how medicine is practiced, throwing out too many of their cash cows (that hardly work).

Study on ChatGPT-4 ... (Score:5, Insightful)

by JasterBobaMereel ( 1102861 )

The doctors were fed information about the patients that was already suitable for giving to ChatGPT ... not required to gather the information themselves

So the largest part of the job of Doctor was omitted, and replaced with data tailored for machines

The researchers gave little or no instruction on how to use ChatGPT, but then compared the results to them using it with all their ChatGPT skills ...

Study finds that people who know how to get the best out of ChatGPT use it well ... and Doctors when taken out of their normal environment do not do as well ...

AI isn't the relevant problem here. (Score:2, Interesting)

by jd ( 1658 )

The problem is that doctors are making elementary errors, failing to verify, and putting ego and large numbers of consultations a day over and above the wellbeing of patients.

That, to me, is gross malpractice.

The correct answer is not necessarily more AI, but that might well be the end result. The correct answer is to require doctors to recertify through such test cases and withdrawing a license to practice if the success rate is under 90%.

AI is, ultimately, just using differential diagnosis, because that's

Re: (Score:3)

by martin-boundary ( 547041 )

I too enjoyed watching House. But just because it's always sarcoidosis doesn't mean AI are doing anything resembling differential diagnosis.

Re: (Score:2)

by ShanghaiBill ( 739463 )

> The problem is that doctors are making elementary errors, failing to verify, and putting ego and large numbers of consultations a day over and above the wellbeing of patients.

TFA does not contain enough information to draw that conclusion.

> The correct answer is not necessarily more AI

AI will be part of the solution.

TFA says that ChatGPT reduced misdiagnoses from 26% to 24%. Two percent might not seem like much, but in a $5 trillion industry, it's a lot.

Doctors will do much better if they're trained to use AI technology. It should be incorporated into medical school curriculum.

What was actually evaluated. (Score:4, Insightful)

by KAdamM ( 2996395 )

In short: 50 patients were studied by real doctors in real hospitals and clinics, and they get a proper diagnosis. Whatever was written in the papers - short history of present illness, past medical history, and symptoms (e.g. temperature, pulse, skin description) - was given to other doctors and LLM. What is shows is that people, to get proper treatment, need direct contact patient with a doctor. This is what doctors are taught, and expected to do. LLM or online consultation will not replace that.

failure to understand procedure (Score:2)

by TimothyHollins ( 4720957 )

I don't trust these conclusions *at all*.

AI, and machine learning, as performed by computer scientists, completely miss the meaning of data and protocol.

In machine learning/AI, a computer scientist will try to achieve the highest possible AUC. This is frequently seen when a dataset of 1,000,000 tests (99% controls, 1% cases) yields the best results when predicted as ANYTHING -> CONTROL. For a doctor, the 1% cases are the difficult part, not the 99% of controls.

A doctor should operate by a hierarchy of di

News: 0175490553

ChatGPT-4 Beat Doctors at Diagnosing Illness, Study Finds (nytimes.com)

Dunning-Kruger effect (Score:2)

Re:Dunning-Kruger effect (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:1)

Study on ChatGPT-4 ... (Score:5, Insightful)

AI isn't the relevant problem here. (Score:2, Interesting)

Re: (Score:3)

Re: (Score:2)

What was actually evaluated. (Score:4, Insightful)

failure to understand procedure (Score:2)