{"id":260394,"date":"2025-07-13T01:43:26","date_gmt":"2025-07-13T01:43:26","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/260394\/"},"modified":"2025-07-13T01:43:26","modified_gmt":"2025-07-13T01:43:26","slug":"cognitive-bias-in-clinical-large-language-models","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/260394\/","title":{"rendered":"Cognitive bias in clinical large language models"},"content":{"rendered":"<p>Biases affecting clinical LLM systems can arise at multiple stages, including data-related biases from collection and representation, model-related biases from algorithm design and training, and deployment-related biases stemming from real-world use and feedback<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Hasanzadeh, F. et al. Bias recognition and mitigation strategies in artificial intelligence healthcare applications. NPJ Digit Med. 8, 154 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR5\" id=\"ref-link-section-d296412786e481\" target=\"_blank\" rel=\"noopener\">5<\/a>. Cognitive biases\u2014here referring to systematic deviations from rational reasoning that affect clinical decision-making\u2014can interact with and emerge at each of these stages<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Laura, Z. Cognitive bias in large language models: implications for research and practice. NEJM AI 1, AIe2400961 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR3\" id=\"ref-link-section-d296412786e485\" target=\"_blank\" rel=\"noopener\">3<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Hasanzadeh, F. et al. Bias recognition and mitigation strategies in artificial intelligence healthcare applications. NPJ Digit Med. 8, 154 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR5\" id=\"ref-link-section-d296412786e488\" target=\"_blank\" rel=\"noopener\">5<\/a>. For instance, these biases can enter LLM systems through incomplete or skewed training data (e.g., datasets that underrepresent certain patient populations), incorporation of flawed heuristics into algorithms (e.g., diagnostic rules that overlook atypical symptom presentations in certain groups), or deployment in contexts that amplify existing healthcare disparities<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Hasanzadeh, F. et al. Bias recognition and mitigation strategies in artificial intelligence healthcare applications. NPJ Digit Med. 8, 154 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR5\" id=\"ref-link-section-d296412786e492\" target=\"_blank\" rel=\"noopener\">5<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\" title=\"Schmidgall, S. et al. Evaluation and mitigation of cognitive biases in medical language models. NPJ Digit Med. 7, 295 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR6\" id=\"ref-link-section-d296412786e495\" target=\"_blank\" rel=\"noopener\">6<\/a>. The results are tools that not only inherit these clinical reasoning flaws but potentially magnify them through automation<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 3\" title=\"Laura, Z. Cognitive bias in large language models: implications for research and practice. NEJM AI 1, AIe2400961 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR3\" id=\"ref-link-section-d296412786e499\" target=\"_blank\" rel=\"noopener\">3<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\" title=\"Schmidgall, S. et al. Evaluation and mitigation of cognitive biases in medical language models. NPJ Digit Med. 7, 295 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR6\" id=\"ref-link-section-d296412786e502\" target=\"_blank\" rel=\"noopener\">6<\/a>. Clinical LLM applications may commonly encounter several notable cognitive biases, though many others exist:<\/p>\n<ul class=\"u-list-style-bullet\">\n<li>\n<p>Suggestibility bias (prioritizing user agreement over independent reasoning) can lead LLMs to adopt incorrect answers when confronted with persuasive but inaccurate prompts. This can emerge from reinforcement learning methods that optimize for user satisfaction metrics or from training approaches where agreement with user inputs is inadvertently rewarded more than factual correctness<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\" title=\"Schmidgall, S. et al. Evaluation and mitigation of cognitive biases in medical language models. NPJ Digit Med. 7, 295 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR6\" id=\"ref-link-section-d296412786e512\" target=\"_blank\" rel=\"noopener\">6<\/a>. For example, when presented with confident-sounding rebuttals\u2014particularly those citing external sources\u2014LLMs frequently revised correct diagnostic answers to align with user suggestions, even when doing so meant sacrificing accuracy<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\" title=\"Fanous, A. et al. SycEval: evaluating LLM sycophancy. Preprint at &#010;                  https:\/\/doi.org\/10.48550\/arXiv.2502.08177&#010;                  &#010;                 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR7\" id=\"ref-link-section-d296412786e516\" target=\"_blank\" rel=\"noopener\">7<\/a>.<\/p>\n<\/li>\n<li>\n<p>Availability bias (relying on the most easily recalled or commonly seen information) can influence LLM-driven decision support when training data contains overrepresented clinical patterns or patient profiles, causing models to give disproportionate weight to common examples in their corpus<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 6\" title=\"Schmidgall, S. et al. Evaluation and mitigation of cognitive biases in medical language models. NPJ Digit Med. 7, 295 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR6\" id=\"ref-link-section-d296412786e532\" target=\"_blank\" rel=\"noopener\">6<\/a>. In an experiment with four commercial LLMs, each model recommended outdated race-adjusted equations for estimating eGFR and lung capacity\u2014guidance no longer supported by evidence\u2014demonstrating how the prevalence and recallability of race-based formulas in the training corpus became the models\u2019 default advice<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\" title=\"Omiye, J. A., Lester, J. C., Spichak, S., Rotemberg, V. &amp; Daneshjou, R. Large language models propagate race-based medicine. NPJ Digit Med. 6, 195 (2023).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR8\" id=\"ref-link-section-d296412786e536\" target=\"_blank\" rel=\"noopener\">8<\/a>.<\/p>\n<\/li>\n<li>\n<p>Confirmation bias (seeking evidence that supports initial hypotheses) can emerge in clinical LLMs in both development and deployment stages. During development, confirmation bias can be encoded when training labels, such as those used in supervised fine-tuning, reinforce prevailing clinical assumptions or when model evaluation metrics favor agreement with existing diagnostic patterns<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Hasanzadeh, F. et al. Bias recognition and mitigation strategies in artificial intelligence healthcare applications. NPJ Digit Med. 8, 154 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR5\" id=\"ref-link-section-d296412786e546\" target=\"_blank\" rel=\"noopener\">5<\/a>. At deployment, the bias can manifest in human-model interactions when interfaces are designed to highlight outputs that match clinicians\u2019 initial impressions<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 5\" title=\"Hasanzadeh, F. et al. Bias recognition and mitigation strategies in artificial intelligence healthcare applications. NPJ Digit Med. 8, 154 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR5\" id=\"ref-link-section-d296412786e550\" target=\"_blank\" rel=\"noopener\">5<\/a>. In one study, pathology experts were significantly more likely to keep an erroneous tumor\u2011cell\u2011percentage estimate when an equivalently incorrect LLM recommendation aligned with their preliminary judgment, illustrating how human and model errors can co\u2011reinforce rather than correct one another<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 9\" title=\"Rosbach, E. et al. When two wrongs don&#x2019;t make a right&#x201D; &#x2013; examining confirmation bias and the role of time pressure during human-AI collaboration in computational pathology. Preprint at &#010;                  https:\/\/doi.org\/10.48550\/arXiv.2411.01007&#010;                  &#010;                 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR9\" id=\"ref-link-section-d296412786e554\" target=\"_blank\" rel=\"noopener\">9<\/a>.<\/p>\n<\/li>\n<li>\n<p>Framing bias (influence of presentation or wording on decision-making) can affect LLM-enabled systems when the same clinical information presented in different ways leads to different model outputs. This may occur when LLMs learn language patterns where certain descriptive words or presentation formats are statistically associated with particular clinical conclusions in their training data<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 10\" title=\"Schmidt, H. G., Rotgans, J. I. &amp; Mamede, S. Bias sensitivity in diagnostic decision-making: comparing ChatGPT with residents. J. Gen. Intern. Med. 40, 790&#x2013;795 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR10\" id=\"ref-link-section-d296412786e564\" target=\"_blank\" rel=\"noopener\">10<\/a>. One study found that GPT-4\u2019s diagnostic accuracy declined when clinical cases were reframed with disruptive behaviors or other salient but irrelevant details\u2014mirroring the effects of framing on human clinicians and highlighting the model\u2019s susceptibility to the same cognitive distortion<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 10\" title=\"Schmidt, H. G., Rotgans, J. I. &amp; Mamede, S. Bias sensitivity in diagnostic decision-making: comparing ChatGPT with residents. J. Gen. Intern. Med. 40, 790&#x2013;795 (2025).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR10\" id=\"ref-link-section-d296412786e568\" target=\"_blank\" rel=\"noopener\">10<\/a>.<\/p>\n<\/li>\n<li>\n<p>Anchoring bias (relying on early information in making decisions) can surface in LLM-enabled diagnosis when early input or output data becomes the LLM\u2019s cognitive \u201canchor\u201d for subsequent reasoning. This effect can emerge when LLMs predominantly process information sequentially (autoregressive processing), generating each part of their response based on what came before, giving more weight to earlier-formed hypotheses when interpreting new information<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 11\" title=\"Ke, Y. et al. Mitigating cognitive biases in clinical decision-making through multi-agent conversations using large language models: simulation study. J. Med. Internet Res. 26, e59439 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR11\" id=\"ref-link-section-d296412786e578\" target=\"_blank\" rel=\"noopener\">11<\/a>. In a study of challenging clinical vignettes, GPT-4 generated incorrect initial diagnoses that consistently influenced its later reasoning, until a structured multi-agent setup was introduced to challenge that anchor and improve diagnostic accuracy<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 11\" title=\"Ke, Y. et al. Mitigating cognitive biases in clinical decision-making through multi-agent conversations using large language models: simulation study. J. Med. Internet Res. 26, e59439 (2024).\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#ref-CR11\" id=\"ref-link-section-d296412786e582\" target=\"_blank\" rel=\"noopener\">11<\/a>.<\/p>\n<\/li>\n<\/ul>\n<p>When medical experts transfer their cognitive biases to AI systems through training data, validation processes, deployment strategies, or real-time interactions during prompting, these systems risk amplifying rather than reducing clinical errors, potentially embedding human cognitive limitations into ostensibly objective computational tools. Yet, it is also worth considering if certain forms of cognitive bias\u2014such as suggestibility\u2014might support the adaptability and responsiveness that give large language models clinical utility, raising the question of whether zero bias is always optimal. At the same time, it is important to recognize that not all model failures are reflections of cognitive bias\u2014for instance, large language models may also generate content that departs entirely from clinical fact, an effect better described as hallucination (Fig. <a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"http:\/\/www.nature.com\/articles\/s41746-025-01790-0#Fig1\" target=\"_blank\" rel=\"noopener\">1<\/a>).<\/p>\n<p><b id=\"Fig1\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Fig. 1<\/b><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41746-025-01790-0\/figures\/1\" rel=\"nofollow noopener\" target=\"_blank\"><img decoding=\"async\" aria-describedby=\"Fig1\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/07\/41746_2025_1790_Fig1_HTML.png\" alt=\"figure 1\" loading=\"lazy\" width=\"685\" height=\"600\"\/><\/a><\/p>\n<p>Select cognitive biases in clinical large language models.<\/p>\n","protected":false},"excerpt":{"rendered":"Biases affecting clinical LLM systems can arise at multiple stages, including data-related biases from collection and representation, model-related&hellip;\n","protected":false},"author":2,"featured_media":260395,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4316],"tags":[3967,12848,8668,3968,105,3941,4348,1096,20181,16,15],"class_list":{"0":"post-260394","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-healthcare","8":"tag-biomedicine","9":"tag-biotechnology","10":"tag-computer-science","11":"tag-general","12":"tag-health","13":"tag-health-care","14":"tag-healthcare","15":"tag-medical-research","16":"tag-medicine-public-health","17":"tag-uk","18":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114843392803406138","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/260394","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=260394"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/260394\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/260395"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=260394"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=260394"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=260394"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}