Katie Palmer covers telehealth, clinical artificial intelligence, and the health data economy — with an emphasis on the impacts of digital health care for patients, providers, and businesses. You can reach Katie on Signal at palmer.01.

Getting a paper published in Science is a highlight of many researchers’ careers. But for internist and clinical artificial intelligence researcher Adam Rodman, it’s also been a source of some agita. 

On Thursday, Rodman and his colleagues published a compilation of experiments, including one using real-world data from a Boston emergency department, that show a large language model from OpenAI can outperform physicians in case-based diagnostic and clinical reasoning evaluations. To Rodman, the paper’s co-senior author, it’s a response to a gauntlet thrown down in Science in 1959. That paper “described how you would know that a clinical decision support system was capable of doing diagnosis better than humans,” he said. “And they can do it.”

But as generative AI tools like chatbots are heavily marketed — both to patients and clinicians — it makes him worried that the science experiments, all based on simulated and historical cases, will be misconstrued as proof of AI’s safety and efficacy when used to treat real patients. 

STAT+ Exclusive Story

STAT+



This article is exclusive to STAT+ subscribers
Unlock this article — and get additional analysis of the technologies disrupting health care — by subscribing to STAT+.

Already have an account? Log in

Individual plans

Group plans

View All Plans

To read the rest of this story subscribe to STAT+.

Subscribe