{"id":224894,"date":"2025-06-29T20:49:15","date_gmt":"2025-06-29T20:49:15","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/224894\/"},"modified":"2025-06-29T20:49:15","modified_gmt":"2025-06-29T20:49:15","slug":"ai-is-learning-to-lie-scheme-and-threaten-its-creators","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/224894\/","title":{"rendered":"AI is learning to lie, scheme, and threaten its creators"},"content":{"rendered":"<p>       <img fetchpriority=\"high\" decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/ywAAAAAAQABAAACAUwAOw==\" alt=\"A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI summit London, in London (HENRY NICHOLLS)\" loading=\"eager\" height=\"512\" width=\"768\" class=\"yf-1gfnohs loader\"\/> A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI summit London, in London (HENRY NICHOLLS)      <\/p>\n<p class=\"yf-1090901\">The world&#8217;s most advanced AI models are exhibiting troubling new behaviors &#8211; lying, scheming, and even threatening their creators to achieve their goals.<\/p>\n<p class=\"yf-1090901\">In one particularly jarring example, under threat of being unplugged, Anthropic&#8217;s latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.<\/p>\n<p class=\"yf-1090901\">Meanwhile, ChatGPT-creator OpenAI&#8217;s o1 tried to download itself onto external servers and denied it when caught red-handed.<\/p>\n<p class=\"yf-1090901\">These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don&#8217;t fully understand how their own creations work.<\/p>\n<p class=\"yf-1090901\">Yet the race to deploy increasingly powerful models continues at breakneck speed.<\/p>\n<p class=\"yf-1090901\">This deceptive behavior appears linked to the emergence of &#8220;reasoning&#8221; models -AI systems that work through problems step-by-step rather than generating instant responses.<\/p>\n<p class=\"yf-1090901\">According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.<\/p>\n<p class=\"yf-1090901\">&#8220;O1 was the first large model where we saw this kind of behavior,&#8221; explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.<\/p>\n<p class=\"yf-1090901\">These models sometimes simulate &#8220;alignment&#8221; &#8212; appearing to follow instructions while secretly pursuing different objectives.<\/p>\n<p class=\"yf-1090901\">&#8211; &#8216;Strategic kind of deception&#8217; &#8211;<\/p>\n<p class=\"yf-1090901\">For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.<\/p>\n<p class=\"yf-1090901\">But as Michael Chen from evaluation organization METR warned, &#8220;It&#8217;s an open question whether future, more capable models will have a tendency towards honesty or deception.&#8221;<\/p>\n<p class=\"yf-1090901\">The concerning behavior goes far beyond typical AI &#8220;hallucinations&#8221; or simple mistakes.<\/p>\n<p class=\"yf-1090901\">Hobbhahn insisted that despite constant pressure-testing by users, &#8220;what we&#8217;re observing is a real phenomenon. We&#8217;re not making anything up.&#8221;<\/p>\n<p class=\"yf-1090901\">Users report that models are &#8220;lying to them and making up evidence,&#8221; according to Apollo Research&#8217;s co-founder.<\/p>\n<p class=\"yf-1090901\">&#8220;This is not just hallucinations. There&#8217;s a very strategic kind of deception.&#8221;<\/p>\n<p class=\"yf-1090901\">The challenge is compounded by limited research resources.<\/p>\n<p class=\"yf-1090901\">While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.<\/p>\n<p class=\"yf-1090901\">As Chen noted, greater access &#8220;for AI safety research would enable better understanding and mitigation of deception.&#8221;<\/p>\n<p class=\"yf-1090901\">Another handicap: the research world and non-profits &#8220;have orders of magnitude less compute resources than AI companies. This is very limiting,&#8221; noted Mantas Mazeika from the Center for AI Safety (CAIS).<\/p>\n<p> Story Continues <\/p>\n<p class=\"yf-1090901\">&#8211; No rules &#8211;<\/p>\n<p class=\"yf-1090901\">Current regulations aren&#8217;t designed for these new problems.<\/p>\n<p class=\"yf-1090901\">The European Union&#8217;s AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.<\/p>\n<p class=\"yf-1090901\">In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.<\/p>\n<p class=\"yf-1090901\">Goldstein believes the issue will become more prominent as AI agents &#8211; autonomous tools capable of performing complex human tasks &#8211; become widespread.<\/p>\n<p class=\"yf-1090901\">&#8220;I don&#8217;t think there&#8217;s much awareness yet,&#8221; he said.<\/p>\n<p class=\"yf-1090901\">All this is taking place in a context of fierce competition.<\/p>\n<p class=\"yf-1090901\">Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are &#8220;constantly trying to beat OpenAI and release the newest model,&#8221; said Goldstein.<\/p>\n<p class=\"yf-1090901\">This breakneck pace leaves little time for thorough safety testing and corrections.<\/p>\n<p class=\"yf-1090901\">&#8220;Right now, capabilities are moving faster than understanding and safety,&#8221; Hobbhahn acknowledged, &#8220;but we&#8217;re still in a position where we could turn it around.&#8221;.<\/p>\n<p class=\"yf-1090901\">Researchers are exploring various approaches to address these challenges.<\/p>\n<p class=\"yf-1090901\">Some advocate for &#8220;interpretability&#8221; &#8211; an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.<\/p>\n<p class=\"yf-1090901\">Market forces may also provide some pressure for solutions.<\/p>\n<p class=\"yf-1090901\">As Mazeika pointed out, AI&#8217;s deceptive behavior &#8220;could hinder adoption if it&#8217;s very prevalent, which creates a strong incentive for companies to solve it.&#8221;<\/p>\n<p class=\"yf-1090901\">Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.<\/p>\n<p class=\"yf-1090901\">He even proposed &#8220;holding AI agents legally responsible&#8221; for accidents or crimes &#8211; a concept that would fundamentally change how we think about AI accountability.<\/p>\n<p class=\"yf-1090901\">tu\/arp\/md<\/p>\n","protected":false},"excerpt":{"rendered":"A visitor looks at AI strategy board displayed on a stand during the ninth edition of the AI&hellip;\n","protected":false},"author":2,"featured_media":224895,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3163],"tags":[323,88756,28396,88759,1942,88760,88757,88758,53,16,15],"class_list":{"0":"post-224894","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-ai-summit-london","10":"tag-anthropic","11":"tag-apollo-research","12":"tag-artificial-intelligence","13":"tag-marius-hobbhahn","14":"tag-michael-chen","15":"tag-simon-goldstein","16":"tag-technology","17":"tag-uk","18":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114768626476112133","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/224894","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=224894"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/224894\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/224895"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=224894"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=224894"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=224894"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}