{"id":202181,"date":"2025-09-05T11:32:09","date_gmt":"2025-09-05T11:32:09","guid":{"rendered":"https:\/\/www.europesays.com\/us\/202181\/"},"modified":"2025-09-05T11:32:09","modified_gmt":"2025-09-05T11:32:09","slug":"ai-forecasting-tournament-tried-to-predict-2025-it-couldnt","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/us\/202181\/","title":{"rendered":"AI forecasting tournament tried to predict 2025. It couldn\u2019t."},"content":{"rendered":"<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">Two of the smartest people I follow in the AI world <a href=\"https:\/\/www.youtube.com\/watch?v=1if6XbzD5Yg\" target=\"_blank\" rel=\"noopener\">recently sat down<\/a> to check in on how the field is going.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">One was Fran\u00e7ois Chollet, creator of the widely used <a href=\"https:\/\/keras.io\/\" target=\"_blank\" rel=\"noopener\">Keras library<\/a> and author of the <a href=\"https:\/\/arcprize.org\/arc-agi\" target=\"_blank\" rel=\"noopener\">ARC-AGI benchmark<\/a>, which tests if AI has reached \u201cgeneral\u201d or broadly human-level intelligence. Chollet has a reputation as a bit of an AI bear, eager to deflate the most boosterish and over-optimistic predictions of where the technology is going. But in the discussion, Chollet said his timelines have gotten shorter recently. Researchers had made big progress on what he saw as the major obstacles to achieving artificial general intelligence, like models\u2019 weakness at recalling and applying things they learned before.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1 _1lbxzst7\">Sign up <a href=\"https:\/\/www.vox.com\/pages\/future-perfect-newsletter-signup\" target=\"_blank\" rel=\"noopener\">here<\/a> to explore the big, complicated problems the world faces and the most efficient ways to solve them. Sent twice a week.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">Chollet\u2019s interlocutor \u2014 <a href=\"https:\/\/www.dwarkesh.com\/\" target=\"_blank\" rel=\"noopener\">Dwarkesh Patel<\/a>, whose podcast has become the single most important place for tracking what top AI scientists are thinking \u2014 had, in reaction to his own reporting, moved in the opposite direction. While humans are great at <a href=\"https:\/\/www.dwarkesh.com\/p\/timelines-june-2025?manualredirect=\" target=\"_blank\" rel=\"noopener\">learning continuously<\/a> or \u201con the job,\u201d Patel has become more pessimistic that AI models can gain this skill any time soon.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">\u201c[Humans are] learning from their failures. They\u2019re picking up small improvements and efficiencies as they work,\u201d Patel noted. \u201cIt doesn\u2019t seem like there\u2019s an easy way to slot this key capability into these models.\u201d<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">All of which is to say, two very plugged-in, smart people who know the field as well as anyone else can come to perfectly reasonable yet contradictory conclusions about the pace of AI progress.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">In that case, how is someone like me, who\u2019s certainly less knowledgeable than Chollet or Patel, supposed to figure out who\u2019s right?<\/p>\n<p>The forecaster wars, three years in<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">One of the most promising approaches I\u2019ve seen to resolving \u2014 or at least adjudicating \u2014 these disagreements comes from a small group called the <a href=\"https:\/\/forecastingresearch.org\/\" target=\"_blank\" rel=\"noopener\">Forecasting Research Institute<\/a>.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">In the summer of 2022, the institute began what it calls the <a href=\"https:\/\/forecastingresearch.org\/xpt\" target=\"_blank\" rel=\"noopener\">Existential Risk Persuasion Tournament<\/a> (XPT for short). XPT was <a href=\"https:\/\/static1.squarespace.com\/static\/635693acf15a3e2a14a56a4a\/t\/64f0a7838ccbf43b6b5ee40c\/1693493128111\/XPT.pdf\" target=\"_blank\" rel=\"noopener\">intended<\/a> to \u201cproduce high-quality forecasts of the risks facing humanity over the next century.\u201d To do this, the researchers (including Penn psychologist and <a href=\"https:\/\/www.vox.com\/future-perfect\/23785731\/human-extinction-forecasting-superforecasters\" target=\"_blank\" rel=\"noopener\">forecasting pioneer Philip Tetlock<\/a> and FRI head Josh Rosenberg) surveyed subject matter experts who study threats that at least conceivably could jeopardize humanity\u2019s survival (like AI) in the summer of 2022.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">But they also asked \u201c<a href=\"https:\/\/www.vox.com\/future-perfect\/2024\/2\/13\/24070864\/samotsvety-forecasting-superforecasters-tetlock\" target=\"_blank\" rel=\"noopener\">superforecasters<\/a>,\u201d a group of people identified by Tetlock and others who have proven unusually accurate at predicting events in the past. The superforecaster group was not made up of experts on existential threats to humanity, but rather, generalists from a variety of occupations with solid predictive track records.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">On each risk, including AI, there were <a href=\"https:\/\/www.vox.com\/future-perfect\/23785731\/human-extinction-forecasting-superforecasters\" target=\"_blank\" rel=\"noopener\">big gaps between the area-specific experts and the generalist forecasters<\/a>. The experts were much more likely than the generalists to say that the risk they study could lead to either human extinction or mass deaths. This gap persisted even after the researchers had the two groups engage in structured discussions meant to identify why they disagreed.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">The two just had fundamentally different worldviews. In the case of AI, subject matter experts thought the burden of proof should be on skeptics to show why a hyper-intelligent digital species wouldn\u2019t be dangerous. The generalists thought the burden of proof should be on the experts to explain why a technology that doesn\u2019t even exist yet could kill us all.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">So far, so intractable. Luckily for us observers, each group was asked not only to estimate long-term risks over the next century, which can\u2019t be confirmed any time soon, but also events in the nearer future. They were specifically tasked with predicting the pace of AI progress in the short, medium, and long run.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">In a <a href=\"https:\/\/static1.squarespace.com\/static\/635693acf15a3e2a14a56a4a\/t\/68b6ce72b3435a79858344b7\/1756810866830\/near-term-xpt-accuracy.pdf\" target=\"_blank\" rel=\"noopener\">new paper<\/a>, the authors \u2014 Tetlock, Rosenberg, Simas Ku\u010dinskas, Rebecca Ceppas de Castro, Zach Jacobs, and Ezra Karger \u2014 go back and evaluate how well the two groups fared at predicting the three years of AI progress since summer 2022.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">In theory, this could tell us which group to believe. If the concerned AI experts proved much better at predicting what would happen between 2022\u20132025, Perhaps that\u2019s an indication that they have a better read on the longer-run future of the technology, and therefore, we should give their warnings greater credence.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">Alas, in the words of <a href=\"https:\/\/www.youtube.com\/watch?v=kGpsXuMvApo\" target=\"_blank\" rel=\"noopener\">Ralph Fiennes<\/a>, \u201cWould that it were so simple!\u201d It turns out the three-year results leave us without much more sense of who to believe.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">Both the AI experts and the superforecasters systematically underestimated the pace of AI progress. Across four benchmarks, the actual performance of state-of-the-art models in summer 2025 was better than either superforecasters or AI experts predicted (though the latter was closer). For instance, superforecasters thought an AI would get gold in the International Mathematical Olympiad in 2035. Experts thought 2030. It <a href=\"https:\/\/link.vox.com\/view\/608adc2d91954c3cef0303efoaied.bom\/63b1f94f\" target=\"_blank\" rel=\"noopener\">happened this summer<\/a>.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">\u201cOverall, superforecasters assigned an average probability of just 9.7 percent to the observed outcomes across these four AI benchmarks,\u201d the report concluded, \u201ccompared to 24.6 percent from domain experts.\u201d<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">That makes the domain experts look better. They put slightly higher odds that what actually happened would happen \u2014 but when they crunched the numbers across all questions, the authors concluded that there was no statistically significant difference in aggregate accuracy between the domain experts and superforecasters. What\u2019s more, there was no correlation between how accurate someone was in projecting the year 2025 and how dangerous they thought AI or other risks were. Prediction remains hard, especially about the future, and especially about the future of AI.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">The only trick that reliably worked was aggregating everyone\u2019s forecasts \u2014 lumping all the predictions together and taking the median produced substantially more accurate forecasts than any one individual or group. We may not know which of these soothsayers are smart, but the crowds remain wise.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">Perhaps I should have seen this outcome coming. Ezra Karger, an economist and co-author on both the initial XPT paper and this new one, told me <a href=\"https:\/\/www.vox.com\/future-perfect\/23785731\/human-extinction-forecasting-superforecasters\" target=\"_blank\" rel=\"noopener\">upon the first paper\u2019s release in 2023<\/a> that, \u201cover the next 10 years, there really wasn\u2019t that much disagreement between groups of people who disagreed about those longer run questions.\u201d That is, they already knew that the predictions of people worried about AI and people less worried were pretty similar.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">So, it shouldn\u2019t surprise us too much that one group wasn\u2019t dramatically better than the other at predicting the years 2022\u20132025. The real disagreement wasn\u2019t about the near-term future of AI but about the danger it poses in the medium and long run, which is inherently harder to judge and more speculative.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">There is, perhaps, some valuable information in the fact that both groups underestimated the rate of AI progress: perhaps that\u2019s a sign that we have all underestimated the technology, and it\u2019ll keep improving faster than anticipated. Then again, the predictions in 2022 were all made before the release of ChatGPT in November of that year. Who do you remember before that app\u2019s rollout predicting that AI chatbots would become ubiquitous in work and school? Didn\u2019t we already know that AI made big leaps in capabilities in the years 2022\u20132025? Does that tell us anything about whether the technology might not be slowing down, which, in turn, would be key to forecasting its long-term threat?<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">Reading the latest FRI report, I wound up in a similar place to <a href=\"https:\/\/www.vox.com\/future-perfect\/357346\/ai-prediction-openai-chatgpt-anthropic\" target=\"_blank\" rel=\"noopener\">my former colleague Kelsey Piper last year<\/a>. Piper noted that failing to extrapolate trends, especially exponential trends, out into the future has led people badly astray in the past. The fact that relatively few Americans had Covid in January 2020 did not mean Covid wasn\u2019t a threat; it meant that the country was at the start of an exponential growth curve. A similar kind of failure would lead one to underestimate AI progress and, with it, any potential existential risk.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">At the same time, in most contexts, exponential growth can\u2019t go on forever; it maxes out at some point. It\u2019s remarkable that, say, <a href=\"https:\/\/ourworldindata.org\/data-insights\/moores-law-has-accurately-predicted-the-progress-in-transistor-counts-over-the-last-50-years\" target=\"_blank\" rel=\"noopener\">Moore\u2019s law has broadly predicted the growth in microprocessor density<\/a> accurately for decades \u2014 but Moore\u2019s law is famous in part because it\u2019s unusual for trends about human-created technologies to follow so clean a pattern.<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">\u201cI\u2019ve increasingly come to believe that there is no substitute for digging deep into the weeds when you\u2019re considering these questions,\u201d Piper concluded. \u201cWhile there are questions we can answer from first principles, [AI progress] isn\u2019t one of them.\u201d<\/p>\n<p class=\"duet--article--dangerously-set-cms-markup duet--article--standard-paragraph _1agbrixi lg8ac51 lg8ac50 xkp0cg1\">I fear she\u2019s right \u2014 and that, worse, mere deference to experts doesn\u2019t suffice either, not when experts disagree with each other on both specifics and broad trajectories. We don\u2019t really have a good alternative to trying to learn as much as we can as individuals and, failing that, waiting and seeing. That\u2019s not a satisfying conclusion to a newsletter \u2014 or a comforting answer to one of the most important questions facing humanity \u2014 but it\u2019s the best I can do.<\/p>\n<p class=\"_1tzd3in1\">You\u2019ve read 1 article in the last month<\/p>\n<p class=\"_1tzd3in4\">Here at Vox, we&#8217;re unwavering in our commitment to covering the issues that matter most to you \u2014 threats to democracy, immigration, reproductive rights, the environment, and the rising polarization across this country.<\/p>\n<p class=\"_1tzd3in4\">Our mission is to provide clear, accessible journalism that empowers you to stay informed and engaged in shaping our world. By becoming a Vox Member, you directly strengthen our ability to deliver in-depth, independent reporting that drives meaningful change.<\/p>\n<p class=\"_1tzd3in4\">We rely on readers like you \u2014 join us.<\/p>\n<p><img alt=\"Swati Sharma\" loading=\"lazy\" width=\"59\" height=\"69\" decoding=\"async\" data-nimg=\"1\" style=\"color:transparent\"  src=\"https:\/\/www.europesays.com\/us\/wp-content\/uploads\/2025\/09\/1757071929_326_image\"\/><\/p>\n<p class=\"_1tzd3in8\">Swati Sharma<\/p>\n<p class=\"_1tzd3in9\">Vox Editor-in-Chief<\/p>\n","protected":false},"excerpt":{"rendered":"Two of the smartest people I follow in the AI world recently sat down to check in on&hellip;\n","protected":false},"author":3,"featured_media":202182,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[691,738,14268,2426,158,67,132,68],"class_list":{"0":"post-202181","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-future-perfect","11":"tag-innovation","12":"tag-technology","13":"tag-united-states","14":"tag-unitedstates","15":"tag-us"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@us\/115151473483857721","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/202181","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/comments?post=202181"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/202181\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media\/202182"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media?parent=202181"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/categories?post=202181"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/tags?post=202181"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}