{"id":771201,"date":"2026-05-03T21:43:13","date_gmt":"2026-05-03T21:43:13","guid":{"rendered":"https:\/\/www.europesays.com\/us\/771201\/"},"modified":"2026-05-03T21:43:13","modified_gmt":"2026-05-03T21:43:13","slug":"in-harvard-study-ai-offered-more-accurate-emergency-room-diagnoses-than-two-human-doctors","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/us\/771201\/","title":{"rendered":"In Harvard study, AI offered more accurate emergency room diagnoses than two human doctors"},"content":{"rendered":"<p id=\"speakable-summary\" class=\"wp-block-paragraph\">A new study examines how large language models perform in a variety of medical contexts, including real emergency room cases \u2014 where at least one model seemed to be more accurate than human doctors.<\/p>\n<p class=\"wp-block-paragraph\">The study was <a rel=\"nofollow noopener\" href=\"https:\/\/www.science.org\/doi\/10.1126\/science.adz4433\" target=\"_blank\">published this week in Science<\/a> and comes from a research team led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center. The researchers said they conducted a variety of experiments to measure how OpenAI\u2019s models compared to human physicians.<\/p>\n<p class=\"wp-block-paragraph\">In one experiment, researchers focused on 76 patients who came into the Beth Israel emergency room, comparing the diagnoses offered by two internal medicine attending physicians to those generated by OpenAI\u2019s o1 and 4o models. These diagnoses were assessed by two other attending physicians, who did not know which ones came from humans and which came from AI.<\/p>\n<p class=\"wp-block-paragraph\">\u201cAt each diagnostic touchpoint, o1 either performed nominally better than or on par with the two attending physicians and 4o,\u201d the study said, adding that the differences \u201cwere especially pronounced at the first diagnostic touchpoint (initial ER triage), where there is the least information available about the patient and the most urgency to make the correct decision.\u201d<\/p>\n<p class=\"wp-block-paragraph\">In Harvard Medical School\u2019s <a rel=\"nofollow noopener\" href=\"https:\/\/hms.harvard.edu\/news\/study-suggests-ai-good-enough-diagnosing-complex-medical-cases-warrant-clinical-testing\" target=\"_blank\">press release<\/a> about the study, the researchers emphasized that they did not \u201cpre-process the data at all\u201d \u2014 the AI models were presented with the same information that was available in the electronic medical records at the time of each diagnosis.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">With that information, the o1 model managed to offer \u201cthe exact or very close diagnosis\u201d in 67% of triage cases, compared to one physician who had the exact or close diagnosis 55% of the time, and to the other who hit the mark 50% of the time.<\/p>\n<p class=\"wp-block-paragraph\">\u201cWe tested the AI model against virtually every benchmark, and it eclipsed both prior models and our physician baselines,\u201d said Arjun Manrai, who heads an AI lab at Harvard Medical School and is one of the study\u2019s lead authors, in the press release.<\/p>\n<p>Techcrunch event<\/p>\n<p>\n\t\t\t\t\t\t\t\t\tSan Francisco, CA<br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t|<br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\tOctober 13-15, 2026\n\t\t\t\t\t\t\t<\/p>\n<p class=\"wp-block-paragraph\">To be clear, the study didn\u2019t claim that AI is ready to make real life-or-death decisions in the emergency room. Instead, it said the findings show an \u201curgent need for prospective trials to evaluate these technologies in real-world patient care settings.\u201d<\/p>\n<p class=\"wp-block-paragraph\">The researchers also noted that they only studied how models performed when provided with text-based information, and that \u201cexisting studies suggest that current foundation models are more limited in reasoning over nontext inputs.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Adam Rodman, a Beth Israel doctor who\u2019s also one of the study\u2019s lead authors, <a rel=\"nofollow noopener\" href=\"https:\/\/www.theguardian.com\/technology\/2026\/apr\/30\/ai-outperforms-doctors-in-harvard-trial-of-emergency-triage-diagnoses\" target=\"_blank\">warned the Guardian<\/a> that there\u2019s \u201cno formal framework right now for accountability\u201d around AI diagnoses, and that patients still \u201cwant humans to guide them through life or death decisions [and] to guide them through challenging treatment decisions.\u201d<\/p>\n<p class=\"wp-block-paragraph\">In <a rel=\"nofollow noopener\" href=\"https:\/\/www.youcanknowthings.com\/did-ai-really-beat-er-doctors-at-er-triage\/\" target=\"_blank\">a post about the study<\/a>, Kristen Panthagani, an emergency physician, said this is an \u201can interesting AI study that has led to some very overhyped headlines,\u201d especially since it was comparing AI diagnoses to those from internal medicine physicians, not ER physicians.<\/p>\n<p class=\"wp-block-paragraph\">\u201cIf we\u2019re going to compare AI tools to physicians\u2019 clinical ability, we should start by comparing to physicians who actually practice that specialty,\u201d Panthagani said. \u201cI would not be surprised if a LLM could beat a dermatologist at an neurosurgery board exam, [but] that\u2019s not a particularly helpful thing to know.\u201d<\/p>\n<p class=\"wp-block-paragraph\">She also argued, \u201cAs an ER doctor seeing a patient for a first time, my primary goal is not to guess your ultimate diagnosis. My primary goal is to determine if you have a condition that could kill you.\u201d<\/p>\n<p class=\"wp-block-paragraph\">This post and headline have been updated to reflect the fact that the diagnoses in the study came from internal medicine attending physicians, and to include commentary from Kristen Panthagani.<\/p>\n<p>When you purchase through links in our articles, <a href=\"https:\/\/techcrunch.com\/techcrunch-affiliate-monetization-standards\/\" rel=\"nofollow noopener\" target=\"_blank\">we may earn a small commission<\/a>. This doesn\u2019t affect our editorial independence.<\/p>\n","protected":false},"excerpt":{"rendered":"A new study examines how large language models perform in a variety of medical contexts, including real emergency&hellip;\n","protected":false},"author":3,"featured_media":771202,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[307638,64,8832,305,67,132,68],"class_list":{"0":"post-771201","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-business","8":"tag-beth-israel","9":"tag-business","10":"tag-harvard-medical-school","11":"tag-openai","12":"tag-united-states","13":"tag-unitedstates","14":"tag-us"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@us\/116512830779553106","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/771201","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/comments?post=771201"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/771201\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media\/771202"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media?parent=771201"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/categories?post=771201"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/tags?post=771201"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}