{"id":23551,"date":"2026-04-30T20:24:11","date_gmt":"2026-04-30T20:24:11","guid":{"rendered":"https:\/\/www.europesays.com\/ai\/23551\/"},"modified":"2026-04-30T20:24:11","modified_gmt":"2026-04-30T20:24:11","slug":"a-new-study-found-ais-medical-diagnoses-were-better-than-human-doctors-but-theres-a-catch","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/ai\/23551\/","title":{"rendered":"A new study found AI\u2019s medical diagnoses were better than human doctors \u2014 but there\u2019s a catch"},"content":{"rendered":"<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">When I think of heroic doctors, I think of the physician in the hospital who\u2019s presented with a patient suffering bizarre or vague symptoms and pulls out the right diagnosis just in time. It\u2019s the basis of almost every medical procedural TV show, from House, MD to The Pitt. It\u2019s the mystique that has made doctors among the most revered professionals in society.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">But what if a machine could make that call just as well or even better? What should we do about it here in the real world?<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">That question is becoming more urgent. According to <a href=\"http:\/\/www.science.org\/doi\/10.1126\/science.adz4433\" rel=\"nofollow noopener\" target=\"_blank\">a major new study published in Science<\/a>, advanced artificial intelligence programs often outperform human doctors when diagnosing people seeking emergency medical care.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">AI has already, for better or worse, become a part of modern medicine. Different programs are being used to do everything from <a href=\"https:\/\/www.nytimes.com\/2026\/04\/28\/well\/doctors-using-ai.html\" rel=\"nofollow noopener\" target=\"_blank\">collate physician notes<\/a> to <a href=\"https:\/\/www.reuters.com\/business\/healthcare-pharmaceuticals\/jj-sees-ai-halving-time-generate-drug-development-leads-2026-04-27\/\" rel=\"nofollow noopener\" target=\"_blank\">identify promising new candidates for drug development<\/a>. The authors of the Science study portrayed their findings as strong evidence that AI could be valuable in the emergency room as well \u2014 as long as it is fully vetted in clinical trials for specific uses.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">Lest the hype outpace the science, the authors made a point to say that they feared their research would be cited to justify replacing human doctors with software programs: \u201cI get a little bit queasy about how some of these results might be used,\u201d said co-author Dr. Adam Rodman, a general internis\u00ad\u00ad\u00adt and medical educator at Beth Israel Deaconess Medical Center. They warned against taking such a simplistic view of their findings.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">\u201dNo one should look at this and say we do not need doctors,\u201d Rodman said in a call with reporters.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">At the same time, the researchers did argue that AI had reached the point where it could be a genuine asset for doctors in certain situations \u2014 especially in the ER, where physicians are frequently dealing with imperfect information. They called for clinical trials that would properly assess the safety and efficacy of using AI for those tasks, serving as a second pair of virtual eyes that could act as a gut check for human physicians, or help them when they encounter a case that is outside their experience or expertise.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">AI can clearly be a force for good in health care, they said \u2014 so long as we recognize its limitations and use it in conjunction with, rather than as a replacement for, our human doctors.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">\u201cWe\u2019re witnessing a really profound change in technology that will reshape medicine,\u201d Arjun Manrai, who studies machine learning and statistical modeling for medical decision-making at Harvard Medical School, said.<\/p>\n<p>AI outperformed human doctors in making emergency diagnoses<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">The researchers evaluated OpenAI\u2019s o1 reasoning model, which is a more specialized AI program than, say, ChatGPT. It works more deliberately and with an emphasis on internal logic. They ran the program through several experiments, evaluating its accuracy in both simulated and historical cases that have been used in medical training to test physicians\u2019 critical thinking as well as real-world emergency cases from the Beth Israel hospital. The study then compared how the o1 model performed against human doctors, ChatGPT, and human doctors using ChatGPT.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">Assessing the training cases allowed the researchers to compare o1\u2019s performance to a very large sample of existing data from human doctors who took the same tests. And across those different scenarios, the AI consistently outperformed those physicians and offered the correct diagnosis or a helpful plan for patient management in the vast majority of the cases studied.<\/p>\n<p>Sign up for the Good Medicine newsletter<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1 _1lbxzst7\">Our political wellness landscape has shifted: new leaders, shady science, contradictory advice, broken trust, and overwhelming systems. How is anyone supposed to make sense of it all? Vox\u2019s senior correspondent <a href=\"https:\/\/www.vox.com\/authors\/dylan-scott\" rel=\"nofollow noopener\" target=\"_blank\">Dylan Scott<\/a> has been on the health beat for a long time, and every week, he\u2019ll wade into sticky debates, answer fair questions, and contextualize what\u2019s happening in American health care policy. Sign up <a href=\"https:\/\/www.vox.com\/pages\/good-medicine-newsletter-signup\" rel=\"nofollow noopener\" target=\"_blank\">here<\/a>.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">But its accuracy when evaluating raw electronic health record data from real-world ER cases was especially impressive. This is closest to the messy reality that emergency doctors must often perform in: they are dealing with a person who is in serious need of speedy treatment, and have incomplete and unfiltered information, if they have much information at all. In reviewing those cases, the o1 model identified the exact or a very close diagnosis 67 percent of the time during the patient\u2019s initial presentation at triage (versus 50 and 55 percent respectively for two expert doctors that the AI was measured against) and 81 percent of the time once the patient was ready to be admitted to the hospital (versus 70 and 79 percent for the human doctors).<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">\u201cWe can definitively say\u2026reasoning models can meet that criteria for making diagnostic reasoning at the highest levels of human performance,\u201d Rodman told reporters.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">Two experts I consulted who were unaffiliated with the study \u2014 Dr. Sanjay Basu at UC-San Francisco and Nigam Shah at Stanford \u2014 praised its rigor, but they also noted its limitations. The preexisting training cases studied have been curated specifically for evaluating physicians\u2019 accuracy, so they may overstate how well the model would perform in the real world. In one of the case study experiments that included a set of \u201ccannot-miss\u201d diagnoses when the patient is at risk of serious harm or death, the AI model did not perform any better than ChatGPT or human doctors.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">Even the ER findings, which come closest to assessing the o1 model\u2019s performance under true-to-life conditions, were retrospective reviews of existing cases; the model was not actually asked to diagnose or manage patients in real time.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">That is why, as even the Science study\u2019s authors argued, the next step should not be immediately putting Open AI\u2019s model in charge of emergency triage at hospitals across the country. Instead they called for clinical trials that could assess the model\u2019s performance \u2014 in both accuracy and safety \u2014 under real-world conditions.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">\u201cMedicine is high stakes\u2026 and we have ways to mitigate these risks. They\u2019re called clinical trials,\u201d Rodman told reporters. \u201cWhat these results support is a robust and ambitious research agenda.\u201d<\/p>\n<p>AI could be valuable for doctors \u2014 but patients should be cautious<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">AI hype, especially in medicine, is <a href=\"https:\/\/nam.edu\/publications\/artificial-intelligence-in-health-care-the-hope-the-hype-the-promise-the-peril\/\" rel=\"nofollow noopener\" target=\"_blank\">high<\/a> right now. While listening to the authors discuss their findings, what struck me was their own awareness that their research could be used as a justification for cutting the human medical workforce \u2014 and the risks that could end up creating for patients.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">\u201cThere\u2019s a lot of these so-called AI doctor companies out there that are trying to either cut doctors out of the loop or have minimal clinical supervision,\u201d Rodman said. \u201cAs one of the senior authors on the study, I do not think that these results support that.\u201d<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">The authors emphasized that based on their results, they would envision AI models in the ER being overseen by an actual doctor. Making a diagnosis is only part of treating a patient; it also includes figuring out a treatment plan and monitoring for developments \u2014 as well as the human element. \u201cHumans want humans to guide them through life-or-death decisions,\u201d Manrai said.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">Basu and Shah said they supported narrowly defined uses for AI in the ER based on the collective research so far. It could offer second opinions when a patient is being handed off to another clinician or weigh in on specific high-risk situations (such as a patient presenting with sepsis infection or stroke symptoms) where time is of the essence. It could also reduce paperwork for doctors, an application featured in the most recent season of <a href=\"https:\/\/www.vox.com\/good-medicine-newsletter\/485861\/the-pitt-season-2-finale-langdon-santos-addiction\" rel=\"nofollow noopener\" target=\"_blank\">The Pitt<\/a>. Shah pointed to prior authorization, documentation, and scheduling as obvious areas where AI could help.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">At the same time, AI models should absolutely not be deployed to autonomously diagnose and manage treatment, Basu said.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">Individuals should also be cautious about using AI to make medical decisions. Other studies of AI diagnosis have found worrying results, especially for consumer-facing models like ChatGPT. <a href=\"https:\/\/www.nature.com\/articles\/s41591-026-04297-7\" rel=\"nofollow noopener\" target=\"_blank\">A paper<\/a> published in Nature Medicine earlier this year evaluated how ChatGPT did when presented with scenarios that ranged from non-urgent to emergent and found the model underestimated the seriousness of the patient\u2019s condition in 52 percent of cases; patients who were on the verge of diabetic shock or respiratory failure were instead referred to 24- or 48-hour monitoring. The model repeatedly failed to identify clear signs of suicidal ideation.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">As Shah put it to me, the Science paper represents a \u201cceiling\u201d for using AI for diagnosis, while the Nature Medicine paper represents a floor. The two studies show how precise we need to be when considering AI\u2019s use for making clinical decisions: While the more sophisticated o1 model did well in the Science study reviewing curated cases, the consumer-facing ChatGPT \u2014 developed by the very same company, Open AI \u2014 underperformed in the other paper.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">\u201cBoth can be true,\u201d Basu told me. \u201cBoth are.\u201d<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">In the call with reporters, Manrai described both \u201cgreen\u201d (low-risk) scenarios where an AI might genuinely be helpful even to a lay person and \u201cred\u201d (high-risk) cases where you should always involve a medical professional. A green use would be, for example, asking a model about a diet that could help manage your hypertension or stretches that could alleviate a recent back injury. Think of it more as lifestyle advice than hard clinical guidance.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">A red use, on the other hand, would involve serious medical situations with life-or-death consequences: chest pain, to give one of many possible examples, is cause to go straight to a doctor or the hospital, not to consult ChatGPT.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">We are getting closer to unlocking the awesome potential of these powerful programs to improve medical care, to make what was once science fiction a reality. But even these researchers at the cutting edge agree that we need to move cautiously \u2014 and keep the real experts, the doctors, in the loop.<\/p>\n","protected":false},"excerpt":{"rendered":"When I think of heroic doctors, I think of the physician in the hospital who\u2019s presented with a&hellip;\n","protected":false},"author":2,"featured_media":23552,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[24,25,16198,1657,1668,2819,134],"class_list":{"0":"post-23551","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-ai","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-good-medicine","11":"tag-health","12":"tag-health-care","13":"tag-policy","14":"tag-technology"},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts\/23551","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/comments?post=23551"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts\/23551\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/media\/23552"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/media?parent=23551"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/categories?post=23551"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/tags?post=23551"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}