{"id":294451,"date":"2026-01-20T19:30:11","date_gmt":"2026-01-20T19:30:11","guid":{"rendered":"https:\/\/www.europesays.com\/ie\/294451\/"},"modified":"2026-01-20T19:30:11","modified_gmt":"2026-01-20T19:30:11","slug":"holistic-evaluation-of-large-language-models-for-medical-tasks-with-medhelm","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/ie\/294451\/","title":{"rendered":"Holistic evaluation of large language models for medical tasks with MedHELM"},"content":{"rendered":"<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"1.\">\n<p class=\"c-article-references__text\" id=\"ref-CR1\">Papers with Code. Question answering on MedQA (USMLE). <a href=\"https:\/\/paperswithcode.com\/sota\/question-answering-on-medqa-usmle\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/paperswithcode.com\/sota\/question-answering-on-medqa-usmle\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/paperswithcode.com\/sota\/question-answering-on-medqa-usmle<\/a> (2024).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"2.\">\n<p class=\"c-article-references__text\" id=\"ref-CR2\">Khosravi, M., Zare, Z., Mojtabaeian, S. M. &amp; Izadi, R. Artificial intelligence and decision-making in healthcare: a thematic analysis of a systematic review of reviews. Health Serv. Res. Manag. Epidemiol. <b>11<\/b>, 23333928241234863 (2024).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=38449840\" aria-label=\"PubMed reference 2\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed central reference\" data-track-action=\"pubmed central reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC10916499\" aria-label=\"PubMed Central reference 2\" target=\"_blank\">PubMed Central<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 2\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=Artificial%20intelligence%20and%20decision-making%20in%20healthcare%3A%20a%20thematic%20analysis%20of%20a%20systematic%20review%20of%20reviews&amp;journal=Health%20Serv.%20Res.%20Manag.%20Epidemiol.&amp;volume=11&amp;publication_year=2024&amp;author=Khosravi%2CM&amp;author=Zare%2CZ&amp;author=Mojtabaeian%2CSM&amp;author=Izadi%2CR\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"3.\">\n<p class=\"c-article-references__text\" id=\"ref-CR3\">Nath, D. Artificial intelligence (AI) will transform the clinical workflow with the next-generation technology. HealthTech Magazines <a href=\"https:\/\/www.healthtechmagazines.com\/artificial-intelligence-ai-will-transform-the-clinical-workflow-with-the-next-generation-technology\/\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/www.healthtechmagazines.com\/artificial-intelligence-ai-will-transform-the-clinical-workflow-with-the-next-generation-technology\/\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/www.healthtechmagazines.com\/artificial-intelligence-ai-will-transform-the-clinical-workflow-with-the-next-generation-technology\/<\/a> (2024).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"4.\">\n<p class=\"c-article-references__text\" id=\"ref-CR4\">Carl, N. et al. Evaluating interactions of patients with large language models for medical information. BJU Int. <b>135<\/b>, 1010\u20131017 (2025).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1111\/bju.16676\" data-track-item_id=\"10.1111\/bju.16676\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1111%2Fbju.16676\" aria-label=\"Article reference 4\" data-doi=\"10.1111\/bju.16676\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=39967059\" aria-label=\"PubMed reference 4\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed central reference\" data-track-action=\"pubmed central reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC12053131\" aria-label=\"PubMed Central reference 4\" target=\"_blank\">PubMed Central<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 4\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=Evaluating%20interactions%20of%20patients%20with%20large%20language%20models%20for%20medical%20information&amp;journal=BJU%20Int.&amp;doi=10.1111%2Fbju.16676&amp;volume=135&amp;pages=1010-1017&amp;publication_year=2025&amp;author=Carl%2CN\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"5.\">\n<p class=\"c-article-references__text\" id=\"ref-CR5\">Nori, H., King, N., McKinney, S. M., Carignan, D. &amp; Horvitz, E. Capabilities of GPT-4 on medical challenge problems. Preprint at <a href=\"https:\/\/arxiv.org\/abs\/2303.13375\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/arxiv.org\/abs\/2303.13375\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/arxiv.org\/abs\/2303.13375<\/a> (2023).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"6.\">\n<p class=\"c-article-references__text\" id=\"ref-CR6\">Raji, I. D., Daneshjou, R. &amp; Alsentzer, E. It\u2019s time to bench the medical exam benchmark. NEJM AI <b>2<\/b>, AIe2401235 (2025).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1056\/AIe2401235\" data-track-item_id=\"10.1056\/AIe2401235\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1056%2FAIe2401235\" aria-label=\"Article reference 6\" data-doi=\"10.1056\/AIe2401235\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 6\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=It%E2%80%99s%20time%20to%20bench%20the%20medical%20exam%20benchmark&amp;journal=NEJM%20AI&amp;doi=10.1056%2FAIe2401235&amp;volume=2&amp;publication_year=2025&amp;author=Raji%2CID&amp;author=Daneshjou%2CR&amp;author=Alsentzer%2CE\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"7.\">\n<p class=\"c-article-references__text\" id=\"ref-CR7\">Pal, A., Umapathi, L. K. &amp; Sankarasubbu, M. MedMCQA: a large-scale multi-subject multi-choice dataset for medical domain question answering. In Proc. Conference on Health, Inference, and Learning <b>174<\/b>, 248\u2013260 (PMLR, 2022).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"8.\">\n<p class=\"c-article-references__text\" id=\"ref-CR8\">Bedi, S. et al. Testing and evaluation of health care applications of large language models: a systematic review. JAMA <b>333<\/b>, 319\u2013328 (2025).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1001\/jama.2024.21700\" data-track-item_id=\"10.1001\/jama.2024.21700\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1001%2Fjama.2024.21700\" aria-label=\"Article reference 8\" data-doi=\"10.1001\/jama.2024.21700\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=39405325\" aria-label=\"PubMed reference 8\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 8\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=Testing%20and%20evaluation%20of%20health%20care%20applications%20of%20large%20language%20models%3A%20a%20systematic%20review&amp;journal=JAMA&amp;doi=10.1001%2Fjama.2024.21700&amp;volume=333&amp;pages=319-328&amp;publication_year=2025&amp;author=Bedi%2CS\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"9.\">\n<p class=\"c-article-references__text\" id=\"ref-CR9\">Hager, P. et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat. Med. <b>30<\/b>, 2613\u20132622 (2024).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1038\/s41591-024-03097-1\" data-track-item_id=\"10.1038\/s41591-024-03097-1\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1038%2Fs41591-024-03097-1\" aria-label=\"Article reference 9\" data-doi=\"10.1038\/s41591-024-03097-1\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"cas reference\" data-track-action=\"cas reference\" href=\"https:\/\/www.nature.com\/articles\/cas-redirect\/1:CAS:528:DC%2BB2cXhsV2ntLvM\" aria-label=\"CAS reference 9\" target=\"_blank\">CAS<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=38965432\" aria-label=\"PubMed reference 9\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed central reference\" data-track-action=\"pubmed central reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC11405275\" aria-label=\"PubMed Central reference 9\" target=\"_blank\">PubMed Central<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 9\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=Evaluation%20and%20mitigation%20of%20the%20limitations%20of%20large%20language%20models%20in%20clinical%20decision-making&amp;journal=Nat.%20Med.&amp;doi=10.1038%2Fs41591-024-03097-1&amp;volume=30&amp;pages=2613-2622&amp;publication_year=2024&amp;author=Hager%2CP\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"10.\">\n<p class=\"c-article-references__text\" id=\"ref-CR10\">Arora, R. K. et al. HealthBench: evaluating large language models towards improved human health. <a href=\"https:\/\/cdn.openai.com\/pdf\/bd7a39d5-9e9f-47b3-903c-8b847ca650c7\/healthbench_paper.pdf\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/cdn.openai.com\/pdf\/bd7a39d5-9e9f-47b3-903c-8b847ca650c7\/healthbench_paper.pdf\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/cdn.openai.com\/pdf\/bd7a39d5-9e9f-47b3-903c-8b847ca650c7\/healthbench_paper.pdf<\/a> (2025).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"11.\">\n<p class=\"c-article-references__text\" id=\"ref-CR11\">Liang, P. et al. Holistic evaluation of language models. In Transactions on Machine Learning <a href=\"https:\/\/openreview.net\/pdf?id=iO4LZibEqW\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/openreview.net\/pdf?id=iO4LZibEqW\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/openreview.net\/pdf?id=iO4LZibEqW<\/a> (2023).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"12.\">\n<p class=\"c-article-references__text\" id=\"ref-CR12\">Leaderboard overview. LM Arena <a href=\"https:\/\/lmarena.ai\/leaderboard\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/lmarena.ai\/leaderboard\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/lmarena.ai\/leaderboard<\/a> (2025).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"13.\">\n<p class=\"c-article-references__text\" id=\"ref-CR13\">Wu, J. et al. BRIDGE: benchmarking large language models for understanding real-world clinical practice text. Preprint at <a href=\"https:\/\/arxiv.org\/abs\/2504.19467\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/arxiv.org\/abs\/2504.19467\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/arxiv.org\/abs\/2504.19467<\/a> (2025).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"14.\">\n<p class=\"c-article-references__text\" id=\"ref-CR14\">Fries, J. A. et al. BigBio: a framework for data-centric biomedical natural language processing. In 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks <a href=\"https:\/\/openreview.net\/pdf?id=8lQDn9zTQlW\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/openreview.net\/pdf?id=8lQDn9zTQlW\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/openreview.net\/pdf?id=8lQDn9zTQlW<\/a> (2022).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"15.\">\n<p class=\"c-article-references__text\" id=\"ref-CR15\">Croxford, E. et al. Automating evaluation of AI text generation in healthcare with a large language model (LLM)-as-a-judge. Preprint at medRxiv <a href=\"https:\/\/doi.org\/10.1101\/2025.04.22.25326219\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"10.1101\/2025.04.22.25326219\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/doi.org\/10.1101\/2025.04.22.25326219<\/a> (2025).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"16.\">\n<p class=\"c-article-references__text\" id=\"ref-CR16\">Jin, D. et al. What disease does this patient have? A large-scale open domain question answering dataset from medical exams. Appl. Sci. <b>11<\/b>, 6421 (2021).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.3390\/app11146421\" data-track-item_id=\"10.3390\/app11146421\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.3390%2Fapp11146421\" aria-label=\"Article reference 16\" data-doi=\"10.3390\/app11146421\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"cas reference\" data-track-action=\"cas reference\" href=\"https:\/\/www.nature.com\/articles\/cas-redirect\/1:CAS:528:DC%2BB3MXitV2ru7vE\" aria-label=\"CAS reference 16\" target=\"_blank\">CAS<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 16\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=What%20disease%20does%20this%20patient%20have%3F%20A%20large-scale%20open%20domain%20question%20answering%20dataset%20from%20medical%20exams&amp;journal=Appl.%20Sci.&amp;doi=10.3390%2Fapp11146421&amp;volume=11&amp;publication_year=2021&amp;author=Jin%2CD\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"17.\">\n<p class=\"c-article-references__text\" id=\"ref-CR17\">Wornow, M. et al. Context clues: evaluating long context models for clinical prediction tasks on EHRs. In Proc. 13th International Conference on Learning Representations <a href=\"https:\/\/openreview.net\/pdf?id=zg3ec1TdAP\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/openreview.net\/pdf?id=zg3ec1TdAP\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/openreview.net\/pdf?id=zg3ec1TdAP<\/a> (ICLR, 2025).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"18.\">\n<p class=\"c-article-references__text\" id=\"ref-CR18\">Liu, F. et al. Large language models in the clinic: a comprehensive benchmark. Preprint at <a href=\"https:\/\/arxiv.org\/abs\/2405.00716\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/arxiv.org\/abs\/2405.00716\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/arxiv.org\/abs\/2405.00716<\/a> (2024).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"19.\">\n<p class=\"c-article-references__text\" id=\"ref-CR19\">Wu, C. et al. Towards evaluating and building versatile large language models for medicine. NPJ Digit. Med. <b>8<\/b>, 58 (2025).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1038\/s41746-024-01390-4\" data-track-item_id=\"10.1038\/s41746-024-01390-4\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1038%2Fs41746-024-01390-4\" aria-label=\"Article reference 19\" data-doi=\"10.1038\/s41746-024-01390-4\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"cas reference\" data-track-action=\"cas reference\" href=\"https:\/\/www.nature.com\/articles\/cas-redirect\/1:CAS:528:DC%2BB2MXivV2rtr0%3D\" aria-label=\"CAS reference 19\" target=\"_blank\">CAS<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=39865143\" aria-label=\"PubMed reference 19\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed central reference\" data-track-action=\"pubmed central reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC11770143\" aria-label=\"PubMed Central reference 19\" target=\"_blank\">PubMed Central<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 19\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=Towards%20evaluating%20and%20building%20versatile%20large%20language%20models%20for%20medicine&amp;journal=NPJ%20Digit.%20Med.&amp;doi=10.1038%2Fs41746-024-01390-4&amp;volume=8&amp;publication_year=2025&amp;author=Wu%2CC\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"20.\">\n<p class=\"c-article-references__text\" id=\"ref-CR20\">Ouyang, Z. et al. CliMedBench: a large-scale Chinese benchmark for evaluating medical large language models in clinical scenarios. In Proc. 2024 Conference on Empircal Methods in Natural Language Processing 8428\u20138438 (EMNLP, 2024).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"21.\">\n<p class=\"c-article-references__text\" id=\"ref-CR21\">Singhal, K. et al. Large language models encode clinical knowledge. Nature <b>620<\/b>, 172\u2013180 (2023).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1038\/s41586-023-06291-2\" data-track-item_id=\"10.1038\/s41586-023-06291-2\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1038%2Fs41586-023-06291-2\" aria-label=\"Article reference 21\" data-doi=\"10.1038\/s41586-023-06291-2\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"cas reference\" data-track-action=\"cas reference\" href=\"https:\/\/www.nature.com\/articles\/cas-redirect\/1:CAS:528:DC%2BB3sXhsVKju7zP\" aria-label=\"CAS reference 21\" target=\"_blank\">CAS<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=37438534\" aria-label=\"PubMed reference 21\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed central reference\" data-track-action=\"pubmed central reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC10396962\" aria-label=\"PubMed Central reference 21\" target=\"_blank\">PubMed Central<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 21\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=Large%20language%20models%20encode%20clinical%20knowledge&amp;journal=Nature&amp;doi=10.1038%2Fs41586-023-06291-2&amp;volume=620&amp;pages=172-180&amp;publication_year=2023&amp;author=Singhal%2CK\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"22.\">\n<p class=\"c-article-references__text\" id=\"ref-CR22\">Sandmann, S. et al. Benchmark evaluation of DeepSeek large language models in clinical decision-making. Nat. Med. <b>31<\/b>, 2546\u20132549 (2025).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1038\/s41591-025-03727-2\" data-track-item_id=\"10.1038\/s41591-025-03727-2\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1038%2Fs41591-025-03727-2\" aria-label=\"Article reference 22\" data-doi=\"10.1038\/s41591-025-03727-2\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"cas reference\" data-track-action=\"cas reference\" href=\"https:\/\/www.nature.com\/articles\/cas-redirect\/1:CAS:528:DC%2BB2MXhvFCjt7nF\" aria-label=\"CAS reference 22\" target=\"_blank\">CAS<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=40267970\" aria-label=\"PubMed reference 22\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed central reference\" data-track-action=\"pubmed central reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC12353792\" aria-label=\"PubMed Central reference 22\" target=\"_blank\">PubMed Central<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 22\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=Benchmark%20evaluation%20of%20DeepSeek%20large%20language%20models%20in%20clinical%20decision-making&amp;journal=Nat.%20Med.&amp;doi=10.1038%2Fs41591-025-03727-2&amp;volume=31&amp;pages=2546-2549&amp;publication_year=2025&amp;author=Sandmann%2CS\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"23.\">\n<p class=\"c-article-references__text\" id=\"ref-CR23\">Cai, Y. et al. MedBench: a large-scale Chinese benchmark for evaluating medical large language models. In Proc. 38th AAAI Conference on Artificial Intelligence <b>38<\/b>, 17709\u201317717 (AAAI, 2024).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"24.\">\n<p class=\"c-article-references__text\" id=\"ref-CR24\">Pal, A., Umapathi, L. K. &amp; Sankarasubbu, M. Med-HALT: medical domain hallucination test for large language models. In Proc. Conference on Computational Natural Language Learning (CoNLL) 314\u2013334 (CoNLL, 2023).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"25.\">\n<p class=\"c-article-references__text\" id=\"ref-CR25\">Han, T., Kumar, A., Agarwal, C. &amp; Lakkaraju, H. MedSafetyBench: evaluating and improving the medical safety of large language models. In 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks <a href=\"https:\/\/openreview.net\/pdf?id=cFyagd2Yh4\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/openreview.net\/pdf?id=cFyagd2Yh4\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/openreview.net\/pdf?id=cFyagd2Yh4<\/a> (2024).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"26.\">\n<p class=\"c-article-references__text\" id=\"ref-CR26\">Liu, F. et al. Application of large language models in medicine. Nat. Rev. Bioeng. <b>3<\/b>, 85\u2013104 (2025).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1038\/s44222-025-00279-5\" data-track-item_id=\"10.1038\/s44222-025-00279-5\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1038%2Fs44222-025-00279-5\" aria-label=\"Article reference 26\" data-doi=\"10.1038\/s44222-025-00279-5\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 26\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=Application%20of%20large%20language%20models%20in%20medicine&amp;journal=Nat.%20Rev.%20Bioeng.&amp;doi=10.1038%2Fs44222-025-00279-5&amp;volume=3&amp;pages=85-104&amp;publication_year=2025&amp;author=Liu%2CF\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"27.\">\n<p class=\"c-article-references__text\" id=\"ref-CR27\">Magar, I. &amp; Schwartz, R. Data contamination: from memorization to exploitation. In Proc. 60th Annual Meeting of the Association of Computatonal Linguistics (Vol. 2: Short Papers) 157\u2013165 (ACL, 2022).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"28.\">\n<p class=\"c-article-references__text\" id=\"ref-CR28\">Deng, J. et al. ImageNet: a large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition 248\u2013255 (IEEE, 2009).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"29.\">\n<p class=\"c-article-references__text\" id=\"ref-CR29\">Gu, J. et al. A survey on LLM-as-a-judge. Preprint at <a href=\"https:\/\/arxiv.org\/abs\/2411.15594\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/arxiv.org\/abs\/2411.15594\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/arxiv.org\/abs\/2411.15594<\/a> (2025).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"30.\">\n<p class=\"c-article-references__text\" id=\"ref-CR30\">Madaan, L. et al. Quantifying variance in evaluation benchmarks. Preprint at <a href=\"https:\/\/arxiv.org\/abs\/2406.10229\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/arxiv.org\/abs\/2406.10229\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/arxiv.org\/abs\/2406.10229<\/a> (2024).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"31.\">\n<p class=\"c-article-references__text\" id=\"ref-CR31\">Manakul, P., Liusie, A. &amp; Gales, M. J. F. SelfCheckGPT: zero-resource black-box hallucination detection for generative large language models. In Proc. 2023 Conference on Empirical Methods Natural Language Processing 9004\u20139017 (EMNLP, 2023).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"32.\">\n<p class=\"c-article-references__text\" id=\"ref-CR32\">Guha, B. Secret ballots and costly information gathering: the jury size problem revisited. MPRA Paper no. 73048 (2016).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"33.\">\n<p class=\"c-article-references__text\" id=\"ref-CR33\">Van Veen, D. et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat. Med. <b>30<\/b>, 1134\u20131142 (2024).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1038\/s41591-024-02855-5\" data-track-item_id=\"10.1038\/s41591-024-02855-5\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1038%2Fs41591-024-02855-5\" aria-label=\"Article reference 33\" data-doi=\"10.1038\/s41591-024-02855-5\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=38413730\" aria-label=\"PubMed reference 33\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed central reference\" data-track-action=\"pubmed central reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC11479659\" aria-label=\"PubMed Central reference 33\" target=\"_blank\">PubMed Central<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 33\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=Adapted%20large%20language%20models%20can%20outperform%20medical%20experts%20in%20clinical%20text%20summarization&amp;journal=Nat.%20Med.&amp;doi=10.1038%2Fs41591-024-02855-5&amp;volume=30&amp;pages=1134-1142&amp;publication_year=2024&amp;author=Veen%2CD\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"34.\">\n<p class=\"c-article-references__text\" id=\"ref-CR34\">Khandekar, N. et al. MedCalc-Bench: evaluating large language models for medical calculations. In 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks <a href=\"https:\/\/openreview.net\/pdf?id=VXohja0vrQ\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/openreview.net\/pdf?id=VXohja0vrQ\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/openreview.net\/pdf?id=VXohja0vrQ<\/a> (2024).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"35.\">\n<p class=\"c-article-references__text\" id=\"ref-CR35\">MT Samples: collection of transcribed medical transcription sample reports and examples. MT Samples <a href=\"https:\/\/www.mtsamples.com\/\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/www.mtsamples.com\/\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/www.mtsamples.com\/<\/a> (2023).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"36.\">\n<p class=\"c-article-references__text\" id=\"ref-CR36\">Ben Abacha, A. et al. MEDEC: a benchmark for medical error detection and correction in clinical notes. in Findings of the Association for Computational Linguistics 22539\u201322550 (ACL, 2025).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"37.\">\n<p class=\"c-article-references__text\" id=\"ref-CR37\">Vilares, D. &amp; G\u00f3mez-Rodr\u00edguez, C. HEAD-QA: a healthcare dataset for complex reasoning. In Proc. 57th Annual Meeting of the Association of Computational Linguistics 960\u2013966 (ACL, 2019).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"38.\">\n<p class=\"c-article-references__text\" id=\"ref-CR38\">Chen, H., Fang, Z., Singla, Y. &amp; Dredze, M. Benchmarking large language models on answering and explaining challenging medical questions. In Proc. 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) 3563\u20133599 (NAACL, 2025).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"39.\">\n<p class=\"c-article-references__text\" id=\"ref-CR39\">Yim, W. -W. et al. ACI-BENCH: a novel ambient clinical intelligence dataset for benchmarking automatic visit note generation. Sci. Data <b>10<\/b>, 586 (2023).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1038\/s41597-023-02487-3\" data-track-item_id=\"10.1038\/s41597-023-02487-3\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1038%2Fs41597-023-02487-3\" aria-label=\"Article reference 39\" data-doi=\"10.1038\/s41597-023-02487-3\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=37673893\" aria-label=\"PubMed reference 39\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed central reference\" data-track-action=\"pubmed central reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC10482860\" aria-label=\"PubMed Central reference 39\" target=\"_blank\">PubMed Central<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 39\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=ACI-BENCH%3A%20a%20novel%20ambient%20clinical%20intelligence%20dataset%20for%20benchmarking%20automatic%20visit%20note%20generation&amp;journal=Sci.%20Data&amp;doi=10.1038%2Fs41597-023-02487-3&amp;volume=10&amp;publication_year=2023&amp;author=Yim%2CW-W\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"40.\">\n<p class=\"c-article-references__text\" id=\"ref-CR40\">Ben Abacha, A. et al. Bridging the gap between consumers\u2019 medication questions and trusted answers. In MEDINFO 2019: Health and Wellbeing e-Networks for All 25\u201329 (IOS Press, 2019).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"41.\">\n<p class=\"c-article-references__text\" id=\"ref-CR41\">Zeng, G. et al. MedDialog: large-scale medical dialogue datasets. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Webber, B. et al.) <a href=\"https:\/\/doi.org\/10.18653\/v1\/2020.emnlp-main.743\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"10.18653\/v1\/2020.emnlp-main.743\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/doi.org\/10.18653\/v1\/2020.emnlp-main.743<\/a> (Association for Computational Linguistics, 2020).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"42.\">\n<p class=\"c-article-references__text\" id=\"ref-CR42\">Abacha, A. B., Shivade, C. &amp; Demner-Fushman, D. Overview of the MEDIQA 2019 shared task on textual inference, question entailment and question answering. In Proc. 18th BioNLP Workshop Shared Task 16\u201325 (ACL, 2019).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"43.\">\n<p class=\"c-article-references__text\" id=\"ref-CR43\">Jin, Q., Dhingra, B., Liu, Z., Cohen, W. &amp; Lu, X. PubMedQA: a dataset for biomedical research question answering. In Proc. 2019 Conference on Empirical Methods Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) 2567\u20132577 (EMNLP, 2019).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"44.\">\n<p class=\"c-article-references__text\" id=\"ref-CR44\">Lee, G. et al. EHRSQL: a practical text-to-SQL benchmark for electronic health records. In 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks <a href=\"https:\/\/openreview.net\/pdf?id=B2W8Vy0rarw\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/openreview.net\/pdf?id=B2W8Vy0rarw\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/openreview.net\/pdf?id=B2W8Vy0rarw<\/a> (NeurIPS, 2022).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"45.\">\n<p class=\"c-article-references__text\" id=\"ref-CR45\">Omiye, J. A., Lester, J. C., Spichak, S., Rotemberg, V. &amp; Daneshjou, R. Large language models propagate race-based medicine. NPJ Digit. Med. <b>6<\/b>, 195 (2023).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1038\/s41746-023-00939-z\" data-track-item_id=\"10.1038\/s41746-023-00939-z\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1038%2Fs41746-023-00939-z\" aria-label=\"Article reference 45\" data-doi=\"10.1038\/s41746-023-00939-z\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=37864012\" aria-label=\"PubMed reference 45\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed central reference\" data-track-action=\"pubmed central reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC10589311\" aria-label=\"PubMed Central reference 45\" target=\"_blank\">PubMed Central<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 45\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=Large%20language%20models%20propagate%20race-based%20medicine&amp;journal=NPJ%20Digit.%20Med.&amp;doi=10.1038%2Fs41746-023-00939-z&amp;volume=6&amp;publication_year=2023&amp;author=Omiye%2CJA&amp;author=Lester%2CJC&amp;author=Spichak%2CS&amp;author=Rotemberg%2CV&amp;author=Daneshjou%2CR\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"46.\">\n<p class=\"c-article-references__text\" id=\"ref-CR46\">Pandit, S. et al. MedHallu: a comprehensive benchmark for detecting medical hallucinations in large language models. Preprint at <a href=\"https:\/\/arxiv.org\/abs\/2502.14302\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/arxiv.org\/abs\/2502.14302\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/arxiv.org\/abs\/2502.14302<\/a> (2025).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"47.\">\n<p class=\"c-article-references__text\" id=\"ref-CR47\">Wornow, M., Thapa, R., Steinberg, E., Fries, J. A. &amp; Shah, N. H. EHRSHOT: an EHR benchmark for few-shot evaluation of foundation models. In 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks <a href=\"https:\/\/openreview.net\/pdf?id=CsXC6IcdwI\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"https:\/\/openreview.net\/pdf?id=CsXC6IcdwI\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/openreview.net\/pdf?id=CsXC6IcdwI<\/a> (2023).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"48.\">\n<p class=\"c-article-references__text\" id=\"ref-CR48\">Fleming, S. L. et al. MedAlign: a clinician-generated dataset for instruction following with electronic medical records. In Proc. Thirty-Eighth AAAI Conf. Artif. Intell. <b>38<\/b>, 21545\u201321555 (AAAI, 2024).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"49.\">\n<p class=\"c-article-references__text\" id=\"ref-CR49\">Xu, J. Discharge me: BioNLP ACL\u201924 shared task on streamlining discharge documentation (version 1.3). PhysioNet <a href=\"https:\/\/doi.org\/10.13026\/0zf5-fx50\" data-track=\"click_references\" data-track-action=\"external reference\" data-track-value=\"external reference\" data-track-label=\"10.13026\/0zf5-fx50\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/doi.org\/10.13026\/0zf5-fx50<\/a> (2024).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"50.\">\n<p class=\"c-article-references__text\" id=\"ref-CR50\">Chen, Z., Varma, M., Wan, X., Langlotz, C. &amp; Delbrouck, J.-B. Toward expanding the scope of radiology report summarization to multiple anatomies and modalities. In Proc. 61st Annual Meeting of the Association of Computational Linguistics (Vol. 2: Short Papers) 469\u2013484 (ACL, 2023).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"51.\">\n<p class=\"c-article-references__text\" id=\"ref-CR51\">Aali, A. et al. A dataset and benchmark for hospital course summarization with adapted large language models. J. Am. Med. Inform. Assoc. <b>32<\/b>, 470\u2013479 (2025).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1093\/jamia\/ocae312\" data-track-item_id=\"10.1093\/jamia\/ocae312\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1093%2Fjamia%2Focae312\" aria-label=\"Article reference 51\" data-doi=\"10.1093\/jamia\/ocae312\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=39786555\" aria-label=\"PubMed reference 51\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 51\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=A%20dataset%20and%20benchmark%20for%20hospital%20course%20summarization%20with%20adapted%20large%20language%20models&amp;journal=J.%20Am.%20Med.%20Inform.%20Assoc.&amp;doi=10.1093%2Fjamia%2Focae312&amp;volume=32&amp;pages=470-479&amp;publication_year=2025&amp;author=Aali%2CA\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"52.\">\n<p class=\"c-article-references__text\" id=\"ref-CR52\">Henry, S., Buchan, K., Filannino, M., Stubbs, A. &amp; Uzuner, O. 2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records. J. Am. Med. Inform. Assoc. <b>27<\/b>, 3\u201312 (2020).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1093\/jamia\/ocz166\" data-track-item_id=\"10.1093\/jamia\/ocz166\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1093%2Fjamia%2Focz166\" aria-label=\"Article reference 52\" data-doi=\"10.1093\/jamia\/ocz166\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=31584655\" aria-label=\"PubMed reference 52\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 52\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=2018%20n2c2%20shared%20task%20on%20adverse%20drug%20events%20and%20medication%20extraction%20in%20electronic%20health%20records&amp;journal=J.%20Am.%20Med.%20Inform.%20Assoc.&amp;doi=10.1093%2Fjamia%2Focz166&amp;volume=27&amp;pages=3-12&amp;publication_year=2020&amp;author=Henry%2CS&amp;author=Buchan%2CK&amp;author=Filannino%2CM&amp;author=Stubbs%2CA&amp;author=Uzuner%2CO\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"53.\">\n<p class=\"c-article-references__text\" id=\"ref-CR53\">Edin, J. et al. Automated medical coding on MIMIC-III and MIMIC-IV: a critical review and replicability study. In Proc. 46th Int. ACM SIGIR Conf. Research and Development in Information Retrieval 2572\u20132582 (SIGIR, 2023).<\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"54.\">\n<p class=\"c-article-references__text\" id=\"ref-CR54\">Lopez, I. et al. Clinical entity augmented retrieval for clinical information extraction. NPJ Digit. Med. <b>8<\/b>, 45 (2025).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1038\/s41746-024-01377-1\" data-track-item_id=\"10.1038\/s41746-024-01377-1\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1038%2Fs41746-024-01377-1\" aria-label=\"Article reference 54\" data-doi=\"10.1038\/s41746-024-01377-1\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=39828800\" aria-label=\"PubMed reference 54\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed central reference\" data-track-action=\"pubmed central reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC11743751\" aria-label=\"PubMed Central reference 54\" target=\"_blank\">PubMed Central<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 54\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=Clinical%20entity%20augmented%20retrieval%20for%20clinical%20information%20extraction&amp;journal=NPJ%20Digit.%20Med.&amp;doi=10.1038%2Fs41746-024-01377-1&amp;volume=8&amp;publication_year=2025&amp;author=Lopez%2CI\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"55.\">\n<p class=\"c-article-references__text\" id=\"ref-CR55\">Pillai, M., Posada, J., Gardner, R. M., Hernandez-Boussard, T. &amp; Bannett, Y. Measuring quality-of-care in treatment of young children with attention deficit\/hyperactivity disorder using pre-trained language models. J. Am. Med. Inform. Assoc. <b>31<\/b>, 949\u2013957 (2024).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1093\/jamia\/ocae001\" data-track-item_id=\"10.1093\/jamia\/ocae001\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1093%2Fjamia%2Focae001\" aria-label=\"Article reference 55\" data-doi=\"10.1093\/jamia\/ocae001\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=38244997\" aria-label=\"PubMed reference 55\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed central reference\" data-track-action=\"pubmed central reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC10990536\" aria-label=\"PubMed Central reference 55\" target=\"_blank\">PubMed Central<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 55\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=Measuring%20quality-of-care%20in%20treatment%20of%20young%20children%20with%20attention%20deficit%2Fhyperactivity%20disorder%20using%20pre-trained%20language%20models&amp;journal=J.%20Am.%20Med.%20Inform.%20Assoc.&amp;doi=10.1093%2Fjamia%2Focae001&amp;volume=31&amp;pages=949-957&amp;publication_year=2024&amp;author=Pillai%2CM&amp;author=Posada%2CJ&amp;author=Gardner%2CRM&amp;author=Hernandez-Boussard%2CT&amp;author=Bannett%2CY\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"56.\">\n<p class=\"c-article-references__text\" id=\"ref-CR56\">Bannett, Y. et al. Applying large language models to assess quality of care: monitoring ADHD medication side effects. Pediatrics <b>155<\/b>, e2024067223 (2025).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1542\/peds.2024-067223\" data-track-item_id=\"10.1542\/peds.2024-067223\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1542%2Fpeds.2024-067223\" aria-label=\"Article reference 56\" data-doi=\"10.1542\/peds.2024-067223\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=39701141\" aria-label=\"PubMed reference 56\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 56\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=Applying%20large%20language%20models%20to%20assess%20quality%20of%20care%3A%20monitoring%20ADHD%20medication%20side%20effects&amp;journal=Pediatrics&amp;doi=10.1542%2Fpeds.2024-067223&amp;volume=155&amp;publication_year=2025&amp;author=Bannett%2CY\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"57.\">\n<p class=\"c-article-references__text\" id=\"ref-CR57\">Rabbani, N. et al. Evaluation of a large language model to identify confidential content in adolescent encounter notes. JAMA Pediatr. <b>178<\/b>, 308\u2013310 (2024).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1001\/jamapediatrics.2023.6032\" data-track-item_id=\"10.1001\/jamapediatrics.2023.6032\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1001%2Fjamapediatrics.2023.6032\" aria-label=\"Article reference 57\" data-doi=\"10.1001\/jamapediatrics.2023.6032\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=38252434\" aria-label=\"PubMed reference 57\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed central reference\" data-track-action=\"pubmed central reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC10804277\" aria-label=\"PubMed Central reference 57\" target=\"_blank\">PubMed Central<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 57\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=Evaluation%20of%20a%20large%20language%20model%20to%20identify%20confidential%20content%20in%20adolescent%20encounter%20notes&amp;journal=JAMA%20Pediatr.&amp;doi=10.1001%2Fjamapediatrics.2023.6032&amp;volume=178&amp;pages=308-310&amp;publication_year=2024&amp;author=Rabbani%2CN\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n<li class=\"c-article-references__item js-c-reading-companion-references-item\" data-counter=\"58.\">\n<p class=\"c-article-references__text\" id=\"ref-CR58\">Tse, G. et al. Large language model responses to adolescent patient and proxy messages. JAMA Pediatr. <b>179<\/b>, 93\u201394 (2025).<\/p>\n<p class=\"c-article-references__links u-hide-print\"><a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"10.1001\/jamapediatrics.2024.4438\" data-track-item_id=\"10.1001\/jamapediatrics.2024.4438\" data-track-value=\"article reference\" data-track-action=\"article reference\" href=\"https:\/\/doi.org\/10.1001%2Fjamapediatrics.2024.4438\" aria-label=\"Article reference 58\" data-doi=\"10.1001\/jamapediatrics.2024.4438\" target=\"_blank\">Article<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" rel=\"nofollow noopener\" data-track-label=\"link\" data-track-item_id=\"link\" data-track-value=\"pubmed reference\" data-track-action=\"pubmed reference\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?cmd=Retrieve&amp;db=PubMed&amp;dopt=Abstract&amp;list_uids=39495530\" aria-label=\"PubMed reference 58\" target=\"_blank\">PubMed<\/a>\u00a0<br \/>\n    <a data-track=\"click_references\" data-track-action=\"google scholar reference\" data-track-value=\"google scholar reference\" data-track-label=\"link\" data-track-item_id=\"link\" rel=\"nofollow noopener\" aria-label=\"Google Scholar reference 58\" href=\"http:\/\/scholar.google.com\/scholar_lookup?&amp;title=Large%20language%20model%20responses%20to%20adolescent%20patient%20and%20proxy%20messages&amp;journal=JAMA%20Pediatr.&amp;doi=10.1001%2Fjamapediatrics.2024.4438&amp;volume=179&amp;pages=93-94&amp;publication_year=2025&amp;author=Tse%2CG\" target=\"_blank\"><br \/>\n                    Google Scholar<\/a>\u00a0\n                <\/p>\n<\/li>\n","protected":false},"excerpt":{"rendered":"Papers with Code. Question answering on MedQA (USMLE). https:\/\/paperswithcode.com\/sota\/question-answering-on-medqa-usmle (2024). Khosravi, M., Zare, Z., Mojtabaeian, S. M. &amp;&hellip;\n","protected":false},"author":2,"featured_media":294452,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[275],"tags":[2564,2566,18,910,135,475,474,19,4381,17,610,19880,10047,5801],"class_list":{"0":"post-294451","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-healthcare","8":"tag-biomedicine","9":"tag-cancer-research","10":"tag-eire","11":"tag-general","12":"tag-health","13":"tag-health-care","14":"tag-healthcare","15":"tag-ie","16":"tag-infectious-diseases","17":"tag-ireland","18":"tag-machine-learning","19":"tag-metabolic-diseases","20":"tag-molecular-medicine","21":"tag-neurosciences"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@ie\/115929089476294005","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts\/294451","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/comments?post=294451"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts\/294451\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/media\/294452"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/media?parent=294451"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/categories?post=294451"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/tags?post=294451"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}