{"id":25729,"date":"2026-05-03T07:46:17","date_gmt":"2026-05-03T07:46:17","guid":{"rendered":"https:\/\/www.europesays.com\/ai\/25729\/"},"modified":"2026-05-03T07:46:17","modified_gmt":"2026-05-03T07:46:17","slug":"xiaomis-open-weight-mimo-v2-5-pro-takes-aim-at-claude-opus-with-hours-long-autonomous-coding","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/ai\/25729\/","title":{"rendered":"Xiaomi&#8217;s open-weight MiMo-V2.5-Pro takes aim at Claude Opus with hours-long autonomous coding"},"content":{"rendered":"<p>Xiaomi&#8217;s new MiMo-V2.5-Pro writes a complete compiler in under five hours and lands close to Anthropic&#8217;s Claude Opus 4.6 on coding benchmarks, according to internal tests. The open-weight model also burns through significantly fewer tokens than its Western rivals.<\/p>\n<p><a href=\"https:\/\/mimo.xiaomi.com\/mimo-v2-5-pro\/\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">MiMo-V2.5-Pro<\/a>\u00a0is a mixture-of-experts model, meaning only part of the model fires for each request rather than the whole thing. It packs 1.02 trillion total parameters, with 42 billion active per request. The MiMo team built this version specifically for jobs that run for hours and rack up thousands of tool calls.<\/p>\n<p><a href=\"https:\/\/www.europesays.com\/ai\/wp-content\/uploads\/2026\/05\/mimo-v2-5-architecture.png\"><img fetchpriority=\"high\" decoding=\"async\" class=\"wp-image-55214 size-full\" src=\"https:\/\/www.europesays.com\/ai\/wp-content\/uploads\/2026\/05\/mimo-v2-5-architecture.png\" alt=\"MiMo-V2.5 architecture diagram showing audio, visual, and text inputs feeding into the MiMo Hybrid-SWA backbone.\" width=\"1400\" height=\"851\"\/><\/a>Audio, image, and text each get converted into a format the language model can understand through their own encoders &#8211; three translators feeding into the same backbone. | Image: Xiaomi<\/p>\n<p>The context window sits at the high end of what&#8217;s currently possible: the main version handles up to one million tokens at once, while the base version without retraining caps out at 256,000 tokens.<\/p>\n<p>A compiler in one afternoon<\/p>\n<p>Xiaomi shows off the biggest jump from the previous version through three demos. In the first, the team had the model build a complete compiler project from a Peking University course, a task that typically takes a computer science student several weeks, according to Xiaomi.<\/p>\n<p><a href=\"https:\/\/www.europesays.com\/ai\/wp-content\/uploads\/2026\/05\/mimo-v2-5-pro-sysy-compiler-progress.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-55217 size-full\" src=\"https:\/\/www.europesays.com\/ai\/wp-content\/uploads\/2026\/05\/mimo-v2-5-pro-sysy-compiler-progress.png\" alt=\"Line chart showing test pass rates climbing across four phases of a compiler project.\" width=\"1388\" height=\"714\"\/><\/a>MiMo-V2.5-Pro worked through the compiler in four phases over 4.3 hours, pushing test coverage from 59 percent on the first compile to a perfect 100 percent. | Image: Xiaomi<\/p>\n<p>MiMo-V2.5-Pro finished the project in 4.3 hours across 672 tool calls, scoring 233 out of 233 on the hidden test suite. Xiaomi says the approach is the most interesting part: The model first laid out the entire pipeline as scaffolding, then worked through each stage layer by layer. Its first compile run already passed 137 of 233 tests. A later refactoring phase introduced a regression, which the model diagnosed and fixed on its own.<\/p>\n<p>In the second demo, MiMo-V2.5-Pro wrote a desktop video editor with roughly 8,000 lines of code from just a few prompts. The model ran autonomously for 11.5 hours and made about 1,870 tool calls.<\/p>\n<p>For the third demo, Xiaomi hooked the model up to a circuit simulator through Claude Code and tasked it with designing a voltage regulator. Within an hour, the result hit all six technical specs at once. Four of them beat the model&#8217;s first draft by roughly an order of magnitude.<\/p>\n<p>Fewer tokens, comparable results<\/p>\n<p>Xiaomi is pitching MiMo-V2.5-Pro mainly on its performance-to-token ratio. On the company&#8217;s own ClawEval agent benchmark, the model hits 64 percent with around 70,000 tokens per task run. That&#8217;s 40 to 60 percent fewer tokens than Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 need to reach similar numbers, according to the team.<\/p>\n<p><a href=\"https:\/\/www.europesays.com\/ai\/wp-content\/uploads\/2026\/05\/mimo-v2-5-pro-benchmark-overview.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-55215 size-full\" src=\"https:\/\/www.europesays.com\/ai\/wp-content\/uploads\/2026\/05\/mimo-v2-5-pro-benchmark-overview.png\" alt=\"Eight bar charts comparing MiMo-V2.5-Pro against Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 across coding, agent, and reasoning benchmarks.\" width=\"1400\" height=\"1117\"\/><\/a>MiMo-V2.5-Pro and Claude Opus 4.6 run nearly even on SWE-Bench Pro, while the Xiaomi model edges ahead on Terminal-Bench 2.0. | Image: Xiaomi<\/p>\n<p>On coding benchmarks, the model scores 78.9 on SWE-bench Verified, 57.2 on SWE-Bench Pro, and 68.4 on Terminal-Bench 2.0. On Xiaomi&#8217;s in-house MiMo Coding Bench, it scores 73.7, putting it close to Claude Opus 4.6 (77.1) and well ahead of Gemini 3.1 Pro (67.8). For general agent tasks, MiMo-V2.5-Pro hits 1,581 Elo points on GDPVal-AA and 72.9 on tau3-bench.<\/p>\n<p>The progress shows up most clearly in long-context work. On OpenAI&#8217;s GraphWalks benchmark, which has the model navigate complex node graphs, the previous MiMo-V2-Pro dropped to zero at one million tokens. MiMo-V2.5-Pro still scores 0.37 on breadth-first searches and 0.62 on parent node queries at the same length.<\/p>\n<p>The model inherits its technical foundation from its predecessor, MiMo-V2-Flash. According to Xiaomi, a mix of local and global attention cuts memory needs for long texts by nearly seven times, while a parallel token prediction mechanism triples output speed. Pre-training ran on 27 trillion tokens, with the context window then expanded in stages up to one million tokens.<\/p>\n<p>For post-training, Xiaomi uses a teacher-student setup: several specialized models first get optimized separately for areas like math, security, or tool use. A single student model then learns from its own attempts under the guidance of all the specialists, combining their skills into one.<\/p>\n<p>Three more models alongside the flagship<\/p>\n<p>Xiaomi is shipping three other systems alongside the Pro model. <a href=\"https:\/\/mimo.xiaomi.com\/mimo-v2-5\/\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">MiMo-V2.5<\/a> is a smaller version with 310 billion parameters, 15 billion of them active per request. It handles text, images, video, and audio directly and also supports up to one million tokens of context. Trained on roughly 48 trillion tokens, it scores 87.7 on the Video-MME benchmark, putting it on par with Gemini 3 Pro, according to Xiaomi. This model is also available <a href=\"https:\/\/huggingface.co\/XiaomiMiMo\/MiMo-V2.5\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">as open weights<\/a> on Hugging Face.<\/p>\n<p><a href=\"https:\/\/mimo.xiaomi.com\/mimo-v2-5-tts\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">MiMo-V2.5-TTS<\/a> is a family of three variants: one with preset voices, one that generates new voices from text descriptions, and one that clones voices from short audio clips. Users can shape pronunciation by dropping control tags like [crying] or [whispers] straight into the text. These models are API-only through Xiaomi&#8217;s platform, though currently free for a limited time.<\/p>\n<p><a href=\"https:\/\/www.europesays.com\/ai\/wp-content\/uploads\/2026\/05\/mimo-v2-5-tts-getting-started.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-55216 size-full\" src=\"https:\/\/www.europesays.com\/ai\/wp-content\/uploads\/2026\/05\/mimo-v2-5-tts-getting-started.png\" alt=\"Screenshot of MiMo Studio with the model dropdown open, highlighting the MiMo Chat sidebar entry and three TTS variants.\" width=\"850\" height=\"482\"\/><\/a>The three TTS variants are API-only through MiMo Studio, currently free of charge. No open weights available. | Image: Xiaomi<\/p>\n<p>The <a href=\"https:\/\/mimo.xiaomi.com\/mimo-v2-5-asr\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">MiMo-V2.5-ASR<\/a> speech recognition model, on the other hand, <a href=\"https:\/\/huggingface.co\/XiaomiMiMo\/MiMo-V2.5-ASR\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">is open<\/a>. It works in both Chinese and English and, per the benchmarks, also handles Chinese dialects like Wu, Cantonese, and Hokkien, plus mid-sentence language switching and song lyrics. On the Open ASR Leaderboard, it averages a 5.73 percent word error rate.<\/p>\n<p><a href=\"https:\/\/www.europesays.com\/ai\/wp-content\/uploads\/2026\/05\/mimo-v2-5-asr-performance-wer.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-55218 size-full\" src=\"https:\/\/www.europesays.com\/ai\/wp-content\/uploads\/2026\/05\/mimo-v2-5-asr-performance-wer.png\" alt=\"Six bar charts comparing word error rates across MiMo-V2.5-ASR, Qwen3-ASR-1.7B, Seed-ASR 2.0, Whisper-Large-V3, FunASR-1.5, and Gemini-3.1-Pro.\" width=\"1280\" height=\"907\"\/><\/a>The gap to Gemini 3.1 Pro is widest on dialects and Chinese lyrics, topping 16 percentage points. Lower scores are better. | Image: Xiaomi<br \/>\nChina&#8217;s open-weight push is about volume<\/p>\n<p>With this release, Xiaomi&#8217;s MiMo team is sticking to the path it set in late 2025: lots of models at once, mostly open, all built for autonomous AI agents. The team points to further scaling of training and a better grasp of long-range relationships beyond individual sentences as the next steps.<\/p>\n<p>Xiaomi rolled out its first complete three-model package recently with <a href=\"https:\/\/the-decoder.com\/xiaomi-launches-three-mimo-ai-models-to-power-agents-robots-and-voice\/\" rel=\"nofollow noopener\" target=\"_blank\">MiMo-V2-Pro, MiMo-V2-Omni, and MiMo-V2-TTS<\/a>. That earlier Pro model had quietly topped the OpenRouter usage rankings for several days under the codename &#8220;Hunter Alpha,&#8221; with many users initially assuming it was a new Deepseek model.<\/p>\n<p>That one has now landed too: <a href=\"https:\/\/the-decoder.com\/as-agentic-ai-pushes-rivals-to-raise-prices-and-cap-usage-deepseek-ships-a-good-enough-model-for-almost-nothing\/\" rel=\"nofollow noopener\" target=\"_blank\">Deepseek has released Deepseek V4<\/a>, currently the largest open model on the market and one that significantly undercuts the competition on price. MiMo-V2.5-Pro now joins the arms race among Chinese open-weight providers\u2014a race that&#8217;s increasingly less about benchmark points and more <a href=\"https:\/\/the-decoder.com\/chinese-ai-model-minimax-m2-7-reportedly-helped-develop-itself\/\" rel=\"nofollow noopener\" target=\"_blank\">about how cheaply and how long a model can work on a task by itself<\/a>.<\/p>\n<p>\t\t\t\tAI News Without the Hype \u2013 Curated by Humans<\/p>\n<p>\n\t\t\t\t\tSubscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive &#8220;AI Radar&#8221; frontier report six times a year, full archive access, and access to our comment section.\t\t\t\t<\/p>\n<p>\t\t\t\t<a href=\"https:\/\/the-decoder.com\/subscription\/\" class=\"inline-block text-white bg-(--heise-primary) mt-3 hover:bg-blue-800 focus:ring-4 focus:outline-none focus:ring-blue-300 font-medium rounded-sm w-full sm:w-auto  pl-3 pr-3 py-2.5 text-center newsletter-submit-button hover:no-underline\" rel=\"nofollow noopener\" target=\"_blank\"><br \/>\n\t\t\t\t\tSubscribe now\t\t\t\t<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"Xiaomi&#8217;s new MiMo-V2.5-Pro writes a complete compiler in under five hours and lands close to Anthropic&#8217;s Claude Opus&hellip;\n","protected":false},"author":2,"featured_media":25730,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8],"tags":[53,3154,182,335,11846],"class_list":{"0":"post-25729","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-anthropic","8":"tag-anthropic","9":"tag-anthropic-claude","10":"tag-claude","11":"tag-open-source","12":"tag-xiaomi"},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts\/25729","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/comments?post=25729"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts\/25729\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/media\/25730"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/media?parent=25729"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/categories?post=25729"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/tags?post=25729"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}