{"id":697010,"date":"2026-01-15T05:51:25","date_gmt":"2026-01-15T05:51:25","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/697010\/"},"modified":"2026-01-15T05:51:25","modified_gmt":"2026-01-15T05:51:25","slug":"ai-models-are-starting-to-crack-high-level-math-problems","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/697010\/","title":{"rendered":"AI models are starting to crack high-level math problems\u00a0"},"content":{"rendered":"<p id=\"speakable-summary\" class=\"wp-block-paragraph\">Over the weekend, <a href=\"https:\/\/url.usb.m.mimecastprotect.com\/s\/NIYNCVJDNDFXWoGESGfvHE-bZH?domain=ocf.berkeley.edu\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Neel Somani<\/a>, who is a software engineer, former quant researcher, and a startup founder, was testing the math skills of OpenAI\u2019s new model when he made an unexpected discovery. After pasting the problem into ChatGPT and letting it think for 15 minutes, he came back to a full solution. He evaluated the proof and formalized it with a tool called Harmonic \u2014 but it all checked out.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">\u201cI was curious to establish a baseline for when LLMs are effectively able to solve open math problems compared to where they struggle,\u201d Somani said. The surprise was that, using the latest model, the frontier started to push forward a bit.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">ChatGPT\u2019s <a href=\"https:\/\/chatgpt.com\/share\/69630fa9-02d4-8012-8ef2-84c443c04922\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">chain of thought<\/a> is even more impressive, rattling off mathematical axioms like <a href=\"https:\/\/en.wikipedia.org\/wiki\/Legendre%27s_formula\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Legendre\u2019s formula<\/a>, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Bertrand%27s_postulate\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Bertrand\u2019s postulate<\/a>,\u00a0and\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Star_of_David_theorem\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">the Star of David theorum<\/a>. Eventually, the model found <a href=\"https:\/\/mathoverflow.net\/questions\/138209\/product-of-central-binomial-coefficients\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">a Math Overflow post from 2013<\/a>, where Harvard mathematician Noam Elkies had given an elegant solution to a similar problem. But ChatGPT\u2019s final proof differed from Elkies\u2019 work in important ways, and gave a more complete solution to a version of the problem posed by legendary mathematician Paul Erd\u0151s, whose vast collection of unsolved problems has become a proving ground for AI.<\/p>\n<p class=\"wp-block-paragraph\">For anyone skeptical of machine intelligence, it\u2019s a surprising result \u2014 and it\u2019s not the only one. AI tools have become ubiquitous in mathematics, from formalization-oriented LLMs like Harmonic\u2019s Aristotle to literature review tools like OpenAI\u2019s deep research. But since the release of GPT 5.2 \u2014 which Somani describes as \u201canecdotally more skilled at mathematical reasoning than previous iterations\u201d \u2014 the sheer volume of solved problems has become difficult to ignore, raising new questions about large language models\u2019 ability to push the frontiers of human knowledge.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Somani was looking at the Erd\u0151s problems, a set of over 1,000 conjectures by the Hungarian mathematician that are\u00a0<a href=\"https:\/\/www.erdosproblems.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">maintained\u00a0online<\/a>. The problems have become a tempting target for AI-driven mathematics, varying significantly in both subject matter and difficulty. The first batch of autonomous solutions came in November from\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2511.02864\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">a Gemini-powered model called AlphaEvolve<\/a>\u00a0\u2014 but more recently, Somani and others have found GPT 5.2 to be remarkably adept with high-level math.\u00a0\u00a0<\/p>\n<p class=\"wp-block-paragraph\">Since Christmas, 15 problems have been moved from \u201copen\u201d to \u201csolved\u201d on the Erd\u0151s website \u2014 and 11 of the solutions have specifically credited AI models as involved in the process.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">The revered mathematician Terence Tao has a more nuanced look at the progress <a href=\"https:\/\/github.com\/teorth\/erdosproblems\/wiki\/AI-contributions-to-Erd%C5%91s-problems\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">on his GitHub page<\/a>, counting eight different problems where AI models made meaningful autonomous progress on an Erd\u0151s problem, with six other cases where progress was made by\u00a0locating\u00a0and building on\u00a0previous\u00a0research.\u00a0It\u2019s\u00a0a long way from AI systems being able to do math without human intervention, but\u00a0it\u2019s\u00a0clear that\u00a0there\u2019s\u00a0an important role\u00a0for large models to play.\u00a0<\/p>\n<p>Techcrunch event<\/p>\n<p>\n\t\t\t\t\t\t\t\t\tSan Francisco<br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\t|<br \/>\n\t\t\t\t\t\t\t\t\t\t\t\t\tOctober 13-15, 2026\n\t\t\t\t\t\t\t<\/p>\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/mathstodon.xyz\/@tao\/115891257393270694\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">On Mastodon<\/a>, Tao conjectured that the\u00a0scalable\u00a0nature of AI systems makes them \u201cbetter suited for being systematically applied to the \u2018long tail\u2019 of obscure Erd\u0151s problems, many of which\u00a0actually have straightforward solutions.\u201d<\/p>\n<p class=\"wp-block-paragraph\">\u201cAs such, many of these easier Erd\u0151s problems are now more likely to be solved by purely AI-based methods than by human or hybrid means,\u201d Tao continued.<\/p>\n<p class=\"wp-block-paragraph\">Another driving force is a recent shift toward formalization, a labor-intensive task that makes mathematical reasoning easier to verify and extend. Formalization\u00a0doesn\u2019t\u00a0require use of AI or even computers, but a new crop of automated tools have made the process far easier. The open source \u201cproof assistant\u201d Lean, which was developed at Microsoft Research in 2013, has become widely used within the field as a way of formalizing proof\u2014 and AI tools like Harmonic\u2019s Aristotle promise to automate much of the work of formalization.\u00a0<\/p>\n<p class=\"wp-block-paragraph\">For Harmonic founder Tudor Achim, the sudden jump in solved Erd\u0151s problems is less important than the fact that the world\u2019s greatest mathematicians are starting to take those tools seriously. \u201cI care more about the fact that math and computer science professors are using [AI tools],\u201d Achim said. \u201cThese people have reputations to protect, so when they\u2019re saying they use Aristotle or they use ChatGPT, that\u2019s real evidence.\u201d\u00a0<\/p>\n","protected":false},"excerpt":{"rendered":"Over the weekend, Neel Somani, who is a software engineer, former quant researcher, and a startup founder, was&hellip;\n","protected":false},"author":2,"featured_media":697011,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3163],"tags":[323,1942,210089,210090,128028,53,16,15],"class_list":{"0":"post-697010","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-gpt-5-2","11":"tag-harmonic","12":"tag-mathematics","13":"tag-technology","14":"tag-uk","15":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/115897557468937051","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/697010","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=697010"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/697010\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/697011"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=697010"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=697010"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=697010"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}