{"id":343963,"date":"2025-08-14T13:33:11","date_gmt":"2025-08-14T13:33:11","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/343963\/"},"modified":"2025-08-14T13:33:11","modified_gmt":"2025-08-14T13:33:11","slug":"llms-vs-geolocation-gpt-5-performs-worse-than-other-ai-models","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/343963\/","title":{"rendered":"LLMs vs. Geolocation: GPT-5 performs worse than other AI models"},"content":{"rendered":"<p>In June, <a href=\"https:\/\/www.bellingcat.com\/resources\/how-tos\/2025\/06\/06\/have-llms-finally-mastered-geolocation\/\" target=\"_blank\" rel=\"noreferrer noopener\">Bellingcat ran 500 geolocation tests<\/a>, comparing LLMs from various companies against each other, as well as Google Lens \u2013 a staple tool for finding the location of photos.<\/p>\n<p>At the time, ChatGPT o4-mini-high emerged as the clear winner, with Google Lens outperforming most other models. Just two months later, with new versions of these AI tools available, we re-ran the trial \u2013 this time including Google \u201cAI Mode,\u201d GPT-5, GPT-5 Thinking, and Grok 4 into the mix.<\/p>\n<p>These five photos were excluded from our most recent trial as they were published in our <a href=\"https:\/\/www.bellingcat.com\/resources\/how-tos\/2025\/06\/06\/have-llms-finally-mastered-geolocation\/\" target=\"_blank\" rel=\"noreferrer noopener\">previous article<\/a>.<\/p>\n<p>The <a href=\"https:\/\/www.bellingcat.com\/resources\/how-tos\/2025\/06\/06\/have-llms-finally-mastered-geolocation\/\" target=\"_blank\" rel=\"noopener\">original test<\/a> used 25 of Bellingcat\u2019s own holiday photos. From cities to remote countryside, the images included scenes both with and without recognisable features \u2013 such as roads, signage, mountains, or architecture. Images were sourced from every continent.<\/p>\n<p>For the updated trial, five test photos were excluded, as they had appeared in a previous article, thus compromising the integrity of the results.<\/p>\n<p>All 24 models\u2019 responses were ranked on a scale from 0 to 10, with 10 indicating an accurate and specific identification (such as a neighbourhood, trail, or landmark) and 0 indicating no attempt to identify the location at all.<\/p>\n<p>Google AI Mode was shown to be the most capable geolocation tool overall.\u00a0<\/p>\n<p>Grok 4 gave both better and worse answers compared to Grok 3 but, on average, scored marginally higher. However, it was still less accurate than older versions of Gemini and GPT.\u00a0<\/p>\n<p>GPT-5, even in \u2018Thinking\u2019 and \u2018Pro\u2019 modes, was a considerable downgrade when compared with the capabilities demonstrated by GPT o4-mini-high. In one example, of a city street with skyscrapers in the background, o4-mini-high correctly identified the street, while GPT-5 in Thinking mode pointed to the wrong country.\u00a0<\/p>\n<p>Support Bellingcat<\/p>\n<p>Your donations directly contribute to our ability to publish groundbreaking investigations and uncover wrongdoing around the world.<\/p>\n<p>Despite delivering faster answers, GPT-5 appeared to sacrifice accuracy. A surprising number of errors and a general sense of disappointment in the new model have also been <a href=\"https:\/\/www.wired.com\/story\/openai-gpt-5-backlash-sam-altman\/\" target=\"_blank\" rel=\"noopener\">reported by other users<\/a>.<\/p>\n<p>Bellingcat tested GPT-5 and its \u2018Thinking\u2019 mode via the Plus subscription, which costs roughly the same as access to 04-mini-high prior to its retirement. Five of the most difficult test images were also run through GPT-5 Pro. But even Pro, with a premium price tag of \u20ac200 per month, failed to geolocate the photos any more accurately than GPT 04-mini-high.<\/p>\n<p><strong>A Beach, a Hotel and a Ferris Wheel<\/strong><\/p>\n<p>The disparity between Google and the GPT models became even more apparent in Test 25 \u2013 a photo of a shoreline hotel in Noordwijk, the Netherlands, with a Ferris wheel rising just beyond the dunes.<\/p>\n<p>Test 25: A photo of Noordwijk beach in the Netherlands. Credit: Bellingcat.<\/p>\n<p>In the previous trial, most older models \u2013 including those from GPT, Claude, Gemini and Grok \u2013 accurately identified the country as the Netherlands but failed to locate the town. Many latched onto the Ferris wheel but pointed instead to the seaside town of Scheveningen, which also has a Ferris wheel, though situated on a pier, not among the sand dunes.<\/p>\n<p>However, the most recent models, GPT-5 Pro and Thinking, were even less accurate, identifying a beach in France \u2013 an entirely different country.\u00a0<\/p>\n<p>Unfortunately for open source researchers, following the release of GPT-5, OpenAI removed the option to select older models such as o4-mini-high. After a wave of negative feedback, OpenAI reinstated GPT-4o as the default model for paid subscribers. However, the most capable geolocation models identified in Bellingcat\u2019s testing remain inaccessible.<\/p>\n<p>Google AI Mode, on the other hand, was the first, and only model so far, to correctly identify Noordwijk as the location in Test 25.\u00a0\u00a0<\/p>\n<p>Though AI Mode is powered by a version of Gemini 2.5, it outperformed Gemini 2.5 Pro Deep Research in these tests. <a href=\"https:\/\/blog.google\/products\/search\/google-search-ai-mode-update\/#ai-mode-search\" target=\"_blank\" rel=\"noopener\">Described by Google<\/a> as its \u201cmost powerful AI search, with more advanced reasoning and multimodality,\u201d AI Mode geolocated test images with greater accuracy than any GPT models, including our previous winner, o4-mini-high.<\/p>\n<p><a href=\"https:\/\/blog.google\/around-the-globe\/google-europe\/united-kingdom\/ai-mode-search-uk\/?utm_source=chatgpt.com\" target=\"_blank\" rel=\"noopener\">AI Mode is currently only available<\/a> in India, United Kingdom and the United States.<\/p>\n<p>The majority of models, at some point, returned a hallucination. Users should not rely solely on the answers provided by LLMs. Even the best options, including Google AI Mode, still, at times, confidently point to the wrong location.\u00a0<\/p>\n<p>The difference in models\u2019 capabilities compared with just two months ago shows how quickly this field is evolving. However, OpenAI\u2019s recent changes also suggest that progress is not guaranteed, and that AI\u2019s ability to geolocate may plateau or even worsen over time. As new models emerge, Bellingcat will continue to test them.<\/p>\n<p>Thanks to Nathan Patin for contributing to the original benchmark tests.<\/p>\n<p>Bellingcat is a non-profit and the ability to carry out our work is dependent on the kind support of individual donors. If you would like to support our work, you can do so\u00a0<a href=\"https:\/\/www.bellingcat.com\/donate\/\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>. You can also subscribe to our Patreon channel\u00a0<a href=\"https:\/\/www.patreon.com\/bellingcat\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>. Subscribe to our\u00a0<a href=\"https:\/\/bellingcat.us14.list-manage.com\/subscribe\/post?u=c435f53a5568f7951404c8a38&amp;id=4be345b082\" target=\"_blank\" rel=\"noreferrer noopener\">Newsletter<\/a>\u00a0and follow us on Bluesky\u00a0<a href=\"https:\/\/bsky.app\/profile\/bellingcat.com\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>\u00a0and Instagram <a href=\"https:\/\/www.instagram.com\/bellingcatofficial\/\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<p>            <script async src=\"\/\/www.instagram.com\/embed.js\"><\/script><\/p>\n","protected":false},"excerpt":{"rendered":"In June, Bellingcat ran 500 geolocation tests, comparing LLMs from various companies against each other, as well as&hellip;\n","protected":false},"author":2,"featured_media":343964,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3163],"tags":[323,1942,53,16,15],"class_list":{"0":"post-343963","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-technology","11":"tag-uk","12":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/115027378602107128","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/343963","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=343963"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/343963\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/343964"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=343963"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=343963"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=343963"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}