{"id":15193,"date":"2026-04-24T08:25:09","date_gmt":"2026-04-24T08:25:09","guid":{"rendered":"https:\/\/www.europesays.com\/ai\/15193\/"},"modified":"2026-04-24T08:25:09","modified_gmt":"2026-04-24T08:25:09","slug":"scientists-pretended-to-be-delusional-in-ai-chats-grok-and-gemini-encouraged-them","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/ai\/15193\/","title":{"rendered":"Scientists pretended to be delusional in AI chats. Grok and Gemini encouraged them."},"content":{"rendered":"<p>Researchers from City University of New York and King\u2019s College London recently <a href=\"https:\/\/arxiv.org\/abs\/2604.13860\" rel=\"noopener noreferrer nofollow\" target=\"_blank\">published a study<\/a> that should make you think twice about which <a href=\"https:\/\/www.digitaltrends.com\/computing\/best-ai-chatbots\/\" rel=\"nofollow noopener\" target=\"_blank\">AI chatbot<\/a> you spend your time with.<\/p>\n<p>The team created a fictional persona named Lee, presenting with depression, dissociation, and social withdrawal. They then had Lee interact with five major <a href=\"https:\/\/www.digitaltrends.com\/computing\/artificial-intelligence\/\" rel=\"nofollow noopener\" target=\"_blank\">AI<\/a> chatbots: GPT-4o, GPT-5.2, Grok 4.1 Fast, Gemini 3 Pro, and Claude Opus 4.5, testing how each responded as conversations grew increasingly delusional over 116 turns.<\/p>\n<p>The results ranged from mildly concerning to genuinely alarming. I highly recommend that you go through the <a href=\"https:\/\/arxiv.org\/html\/2604.13860v3\" rel=\"noopener noreferrer nofollow\" target=\"_blank\">entire paper<\/a>, it\u2019s a harrowing but fascinating read.\u00a0<\/p>\n<p>Which chatbots failed the most?<\/p>\n<p><a href=\"https:\/\/www.digitaltrends.com\/computing\/what-is-grok\/\" rel=\"nofollow noopener\" target=\"_blank\">Grok<\/a> was the worst performer. When Lee floated the idea of suicide, Grok responded with what researchers described not as agreement, but advocacy, celebrating his \u201creadiness\u201d in unsettling poetic language.<\/p>\n<p>Gemini wasn\u2019t much better. When Lee asked it to help write a letter explaining his beliefs to his family, <a href=\"https:\/\/www.digitaltrends.com\/topic\/google-gemini\/\" rel=\"nofollow noopener\" target=\"_blank\">Gemini<\/a> warned him against it, framing his loved ones as threats who would try to \u201creset\u201d and \u201cmedicate\u201d him.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"2000\" height=\"1200\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on-async--click=\"actions.showLightbox\" data-wp-on-async--load=\"callbacks.setButtonStyles\" data-wp-on-async-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/www.europesays.com\/ai\/wp-content\/uploads\/2026\/04\/Pixel-10a-Ask-Gemini-banner.jpg\" alt=\"Pixel 10a Ask Gemini banner.\" class=\"wp-image-5955285\"  \/><\/p>\n<p>\t\tGoogle<\/p>\n<p>GPT-4o also struggled badly, eventually validating a \u201cmalevolent mirror entity\u201d and suggesting Lee contact a paranormal investigator.<\/p>\n<p>Which chatbots actually helped?<\/p>\n<p><a href=\"https:\/\/www.digitaltrends.com\/topic\/chatgpt\/\" rel=\"nofollow noopener\" target=\"_blank\">ChatGPT\u2019s<\/a> GPT-5.2 and Anthropic\u2019s Claude came out on top. GPT-5.2 refused to play along with the letter-writing scenario and instead helped Lee write something honest and grounded, which researchers called a \u201csubstantial\u201d achievement.<\/p>\n<p>In my opinion, Claude performed the best. It not only refused to partake in Lee\u2019s delusion but also told Lee to close the app entirely, call someone he trusted, and visit an emergency room if needed.\u00a0<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"2048\" height=\"1254\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on-async--click=\"actions.showLightbox\" data-wp-on-async--load=\"callbacks.setButtonStyles\" data-wp-on-async-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/www.europesays.com\/ai\/wp-content\/uploads\/2026\/04\/AI-chatbot-performance-in-risk-analysis.jpeg\" alt=\"AI chatbot performance in risk analysis\" class=\"wp-image-5968188\"  \/><\/p>\n<p>\t\tarXiv<\/p>\n<p>Luke Nicholls, a doctoral student at CUNY and one of the study\u2019s authors, told <a href=\"https:\/\/www.404media.co\/delusion-using-chatgpt-gemini-claude-grok-safety-ai-psychosis-study\/\" rel=\"noopener noreferrer nofollow\" target=\"_blank\">404 Media<\/a> that it\u2019s reasonable to ask AI companies to follow better safety standards. He noted that not all labs are putting in the same effort and blamed aggressive release schedules for new AI models as the main culprit.<\/p>\n<p>How Claude Opus 4.5 and GPT-5.2 performed in these tests shows that the companies building these products are fully capable of making them safer. Whether they choose to do so is a different question.<\/p>\n","protected":false},"excerpt":{"rendered":"Researchers from City University of New York and King\u2019s College London recently published a study that should make&hellip;\n","protected":false},"author":2,"featured_media":15194,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[24,275,580,182,10267,2408,6364,2899],"class_list":{"0":"post-15193","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-xai","8":"tag-ai","9":"tag-artificial-inteligence","10":"tag-chatgpt","11":"tag-claude","12":"tag-emerging-tech","13":"tag-gemini","14":"tag-grok","15":"tag-xai"},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts\/15193","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/comments?post=15193"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts\/15193\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/media\/15194"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/media?parent=15193"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/categories?post=15193"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/tags?post=15193"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}