{"id":72186,"date":"2025-05-03T21:44:08","date_gmt":"2025-05-03T21:44:08","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/72186\/"},"modified":"2025-05-03T21:44:08","modified_gmt":"2025-05-03T21:44:08","slug":"googles-gemini-has-beaten-pokemon-blue-with-a-little-help","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/72186\/","title":{"rendered":"Google\u2019s Gemini has beaten Pok\u00e9mon Blue (with a little help)"},"content":{"rendered":"<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Google\u2019s <a href=\"https:\/\/techcrunch.com\/2025\/04\/04\/gemini-2-5-pro-is-googles-most-expensive-ai-model-yet\/\" target=\"_blank\" rel=\"noopener\">most expensive AI model<\/a> seems to have crossed a major milestone: Beating a 29-year-old video game.<\/p>\n<p class=\"wp-block-paragraph\">Last night, Google CEO Sundar Pichai <a rel=\"nofollow\" href=\"https:\/\/x.com\/sundarpichai\/status\/1918455766542930004?t=8NLgn42y3kAqUbZVhg4zLw&amp;s=19\">posted triumphantly on X<\/a>, \u201cWhat a finish! Gemini 2.5 Pro just completed Pok\u00e9mon Blue!\u201d<\/p>\n<p class=\"wp-block-paragraph\">To be clear, the <a rel=\"nofollow noopener\" href=\"https:\/\/www.twitch.tv\/gemini_plays_pokemon\" target=\"_blank\">Gemini Plays Pokemon livestream<\/a> was created by (in his own words) \u201ca 30 year old software engineer unaffiliated with Google\u201d who goes by <a rel=\"nofollow noopener\" href=\"https:\/\/bsky.app\/profile\/jcz.dev\" target=\"_blank\">Joel Z<\/a>. But Google executives have been cheering the effort on.<\/p>\n<p class=\"wp-block-paragraph\">For example, Logan Kilpatrick, the product lead for Google AI Studio, <a rel=\"nofollow\" href=\"https:\/\/x.com\/OfficialLoganK\/status\/1913365614397182096\">posted last month<\/a> that Gemini was \u201cmaking great progress at completing Pok\u00e9mon\u201d and had \u201cearned its 5th badge (next best model only has 3 so far, though with a different agent harness),\u201d leading Pichai to <a rel=\"nofollow\" href=\"https:\/\/x.com\/sundarpichai\/status\/1913464625393524967\">joke<\/a>, \u201cWe are working on API, Artificial Pok\u00e9mon Intelligence:)\u201d<\/p>\n<p class=\"wp-block-paragraph\">Why Pok\u00e9mon? Back in February, <a rel=\"nofollow noopener\" href=\"https:\/\/www.anthropic.com\/research\/visible-extended-thinking\" target=\"_blank\">Anthropic highlighted progress<\/a> that its Claude AI models were making in \u201cPok\u00e9mon Red,\u201d writing that Claude\u2019s \u201cextended thinking and agent training\u201d gives it \u201ca major boost\u201d on \u201cmore unexpected\u201d tasks, like playing a classic game. (\u201cPok\u00e9mon Red\u201d and \u201cBlue\u201d are different versions of <a rel=\"nofollow noopener\" href=\"https:\/\/en.wikipedia.org\/wiki\/Pok%C3%A9mon_Red,_Blue,_and_Yellow\" target=\"_blank\">a GameBoy title<\/a> first released in 1996 and tied to the long-running Pok\u00e9mon franchise). There\u2019s even<a rel=\"nofollow noopener\" href=\"https:\/\/www.twitch.tv\/claudeplayspokemon\" target=\"_blank\"> a Claude Plays Pokemon Twitch channel<\/a> that Joel Z cited as an inspiration.<\/p>\n<p class=\"wp-block-paragraph\">Despite its progress, Claude does not appear to have beaten \u201cPok\u00e9mon Red\u201d yet. Does that mean Gemini is objectively better at the game? On his Twitch page, Joel Z urged viewers, \u201cPlease don\u2019t consider this a benchmark for how well an LLM can play Pokemon. You can\u2019t really make direct comparisons \u2014 Gemini and Claude have different tools and receive different information.\u201d<\/p>\n<p class=\"wp-block-paragraph\">And both AI models need help to play the game \u2014 that\u2019s where <a rel=\"nofollow noopener\" href=\"https:\/\/www.lesswrong.com\/posts\/7mqp8uRnnPdbBzJZE\/is-gemini-now-better-than-claude-at-pokemon\" target=\"_blank\">the aforementioned agent harnesses<\/a> come in, providing the models with game screenshots overlaid with additional information, allowing the model to decide how to respond (which may involve calling specialized agents), and then pressing the button that corresponds with the AI\u2019s instruction.<\/p>\n<p>Techcrunch event<\/p>\n<p>\n\t\t\t\t\t\t\t\t\tBerkeley, CA<br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t|<br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\tJune 5\n\t\t\t\t\t\t\t<\/p>\n<p>\t\t\t\t\t\t\t<a href=\"https:\/\/techcrunch.com\/events\/tc-sessions-ai\/exhibit\/?promo=tc_inline_exhibit&amp;utm_campaign=tcsessionsai2025&amp;utm_content=exhibit&amp;utm_medium=ad&amp;utm_source=tc\" class=\"inline-cta__register-button\" target=\"_blank\" rel=\"noopener\"><br \/>\n\t\t\t\t\tBOOK NOW<br \/>\n\t\t\t\t<\/a><\/p>\n<p class=\"wp-block-paragraph\">Joel Z acknowledged that there were other \u201cdev interventions\u201d to help Gemini complete the game, but insisted that it\u2019s not cheating.<\/p>\n<p class=\"wp-block-paragraph\">\u201cMy interventions improve Gemini\u2019s overall decision-making and reasoning abilities,\u201d he says. \u201cI don\u2019t give specific hints \u2014 there are no walkthroughs or direct instructions for particular challenges like Mt. Moon. The only thing that comes even close is letting Gemini know that it needs to talk to a Rocket Grunt twice to obtain the Lift Key, which was a bug that was later fixed in Pokemon Yellow.\u201d<\/p>\n<p class=\"wp-block-paragraph\">Plus, he said, \u201cGemini Plays Pok\u00e9mon is still actively being developed, and the framework continues to evolve.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"Google\u2019s most expensive AI model seems to have crossed a major milestone: Beating a 29-year-old video game. Last&hellip;\n","protected":false},"author":2,"featured_media":72187,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3163],"tags":[323,1942,2332,867,2512,35993,53,16,15],"class_list":{"0":"post-72186","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-gemini","11":"tag-google","12":"tag-pokemon","13":"tag-sundar-pichai","14":"tag-technology","15":"tag-uk","16":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114446091296645860","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/72186","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=72186"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/72186\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/72187"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=72186"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=72186"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=72186"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}