{"id":193004,"date":"2025-06-18T00:12:09","date_gmt":"2025-06-18T00:12:09","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/193004\/"},"modified":"2025-06-18T00:12:09","modified_gmt":"2025-06-18T00:12:09","slug":"googles-gemini-panicked-when-playing-pokemon","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/193004\/","title":{"rendered":"Google&#8217;s Gemini panicked when playing Pok\u00e9mon"},"content":{"rendered":"<p id=\"speakable-summary\" class=\"wp-block-paragraph\">AI companies are battling to dominate the industry, but sometimes they\u2019re also battling in Pok\u00e9mon gyms.<\/p>\n<p class=\"wp-block-paragraph\">As <a href=\"https:\/\/techcrunch.com\/2025\/05\/03\/googles-gemini-has-beaten-pokemon-blue-with-a-little-help\/\" target=\"_blank\" rel=\"noopener\">Google<\/a> and <a href=\"https:\/\/techcrunch.com\/2025\/02\/24\/anthropic-used-pokemon-to-benchmark-its-newest-ai-model\/\" target=\"_blank\" rel=\"noopener\">Anthropic<\/a> both study how their latest AI models navigate early Pok\u00e9mon games, the results can be as amusing as they are enlightening \u2014 and this time, Google DeepMind has <a href=\"https:\/\/storage.googleapis.com\/deepmind-media\/gemini\/gemini_v2_5_report.pdf\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">written in a report<\/a> that Gemini 2.5 Pro resorts to panic when its Pok\u00e9mon are close to death. This can cause the AI\u2019s performance to experience \u201cqualitatively observable degradation in the model\u2019s reasoning capability,\u201d according to the report.<\/p>\n<p class=\"wp-block-paragraph\">AI benchmarking \u2014 or, the process of comparing the performance of different AI models \u2014 is a <a href=\"https:\/\/techcrunch.com\/2024\/03\/07\/heres-why-most-ai-benchmarks-tell-us-so-little\/\" target=\"_blank\" rel=\"noopener\">dubious art<\/a> that often provides <a href=\"https:\/\/techcrunch.com\/2024\/11\/05\/people-are-using-games-like-pictionary-to-benchmark-ai-now\/\" target=\"_blank\" rel=\"noopener\">little context<\/a> for the actual capabilities of a given model. But some researchers think that <a href=\"https:\/\/techcrunch.com\/2025\/03\/03\/people-are-using-super-mario-to-benchmark-ai-now\/\" target=\"_blank\" rel=\"noopener\">studying how AI models play video games<\/a> could be <a href=\"https:\/\/techcrunch.com\/2025\/03\/20\/a-high-schooler-built-a-website-that-lets-you-challenge-ai-models-to-a-minecraft-build-off\/\" target=\"_blank\" rel=\"noopener\">useful<\/a> (or, at the very least, kind of funny). <\/p>\n<p class=\"wp-block-paragraph\">Over the last several months, two developers unaffiliated with Google and Anthropic have set up respective Twitch streams called \u201c<a href=\"https:\/\/www.twitch.tv\/gemini_plays_pokemon\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Gemini Plays Pok\u00e9mon<\/a>\u201d and \u201c<a href=\"https:\/\/www.twitch.tv\/claudeplayspokemon\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Claude Plays Pok\u00e9mon<\/a>,\u201d where anyone can watch in real time as an AI tries to navigate a children\u2019s video game from over 25 years ago.<\/p>\n<p class=\"wp-block-paragraph\">Each stream displays the AI\u2019s \u201creasoning\u201d process \u2014 or, a natural language translation of how the AI evaluates a problem and arrives at a response \u2014 giving us insight into the way that these models work. <\/p>\n<p><img loading=\"lazy\" decoding=\"async\" height=\"539\" width=\"680\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/06\/Screenshot-2025-06-17-at-3.43.39PM.png\" alt=\"\" class=\"wp-image-3019676\"  \/><strong>Image Credits:<\/strong>Google<\/p>\n<p class=\"wp-block-paragraph\">While the progress of these AI models is impressive, they are still not very good at playing Pok\u00e9mon. It takes hundreds of hours for Gemini to reason through a game that a child could complete in exponentially less time.<\/p>\n<p class=\"wp-block-paragraph\">What\u2019s interesting about watching an AI navigate a Pok\u00e9mon game is not so much about its time of completion, but rather how it behaves along the way.<\/p>\n<p class=\"wp-block-paragraph\">\u201cOver the course of the playthrough, Gemini 2.5 Pro gets into various situations which cause the model to simulate \u2018panic,\u2019\u201d the report says. <\/p>\n<p class=\"wp-block-paragraph\">This state of \u201cpanic\u201d can result in the model\u2019s performance getting worse, as the AI may suddenly stop using certain tools at its disposal for a stretch of gameplay. While AI does not think or experience emotion, its actions mimic the way in which a human might make poor, hasty decisions when under stress \u2014 a fascinating, yet unsettling response.<\/p>\n<p class=\"wp-block-paragraph\">\u201cThis behavior has occurred in enough separate instances that the members of the Twitch chat have actively noticed when it is occurring,\u201d the report says.<\/p>\n<p class=\"wp-block-paragraph\">Claude has also exhibited some curious behaviors in its journeys across Kanto. In one instance, the AI picked up on the pattern that when all of its Pok\u00e9mon run out of health, the player character will \u201cwhite out\u201d and return to a Pok\u00e9mon Center.<\/p>\n<p class=\"wp-block-paragraph\">When Claude got stuck in the Mt. Moon cave, it erroneously hypothesized that if it intentionally got all of its Pok\u00e9mon to faint, then it would be transported across the cave to the Pok\u00e9mon Center in the next town.<\/p>\n<p class=\"wp-block-paragraph\">However, that isn\u2019t how the game works. When all of your Pok\u00e9mon die, you return to whatever Pok\u00e9mon Center you used most recently, rather than the nearest geographically. Viewers watched on in horror as the AI essentially tried to kill itself in the game.<\/p>\n<p class=\"wp-block-paragraph\">Despite its shortcomings, there are a few ways in which the AI can outperform human players. As of the release of Gemini 2.5 Pro, the AI is able to solve puzzles with impressive accuracy.<\/p>\n<p class=\"wp-block-paragraph\">With some human assistance, the AI created agentic tools \u2014 prompted instances of Gemini 2.5 Pro geared toward specific tasks \u2014 to solve the game\u2019s boulder puzzles and find efficient routes to reach a destination.<\/p>\n<p class=\"wp-block-paragraph\">\u201cWith only a prompt describing boulder physics and a description of how to verify a valid path, Gemini 2.5 Pro is able to one-shot some of these complex boulder puzzles, which are required to progress through Victory Road,\u201d the report says.<\/p>\n<p class=\"wp-block-paragraph\">Since Gemini 2.5 Pro did a lot of the work in creating these tools on its own, Google theorizes that the current model may be capable of creating these tools without human intervention. Who knows, maybe Gemini will therapize itself into creating a \u201cdon\u2019t panic\u201d module.<\/p>\n","protected":false},"excerpt":{"rendered":"AI companies are battling to dominate the industry, but sometimes they\u2019re also battling in Pok\u00e9mon gyms. As Google&hellip;\n","protected":false},"author":2,"featured_media":72187,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3163],"tags":[323,1942,55155,2332,867,2512,53,16,15],"class_list":{"0":"post-193004","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-claude","11":"tag-gemini","12":"tag-google","13":"tag-pokemon","14":"tag-technology","15":"tag-uk","16":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114701477601322528","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/193004","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=193004"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/193004\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/72187"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=193004"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=193004"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=193004"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}