{"id":13430,"date":"2025-06-25T11:59:16","date_gmt":"2025-06-25T11:59:16","guid":{"rendered":"https:\/\/www.europesays.com\/us\/13430\/"},"modified":"2025-06-25T11:59:16","modified_gmt":"2025-06-25T11:59:16","slug":"minimax-releases-m1-a-456b-hybrid-attention-model-for-long-context-reasoning-and-software-tasks","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/us\/13430\/","title":{"rendered":"MiniMax Releases M1: A 456B Hybrid-Attention Model for Long-Context Reasoning and Software Tasks"},"content":{"rendered":"<p>MiniMax has introduced <a href=\"https:\/\/huggingface.co\/MiniMaxAI\/MiniMax-M1-40k\" target=\"_blank\" rel=\"noopener\">MiniMax-M1<\/a>, an open-weight language model designed for long-context reasoning and tool use. Based on the earlier <a href=\"https:\/\/huggingface.co\/MiniMaxAI\/MiniMax-Text-01\" target=\"_blank\" rel=\"noopener\">MiniMax-Text-01<\/a>, M1 uses a hybrid Mixture-of-Experts (MoE) architecture and a new \u201clightning attention\u201d mechanism. The model has a total capacity of 456 billion parameters, with 45.9 billion active per token, and supports context lengths of up to 1 million tokens.<\/p>\n<p>M1 distinguishes itself through its efficient use of compute and support for long-context reasoning. Its lightning attention mechanism reduces test-time computation, requiring only 25% of the <a href=\"https:\/\/pl.wikipedia.org\/wiki\/FLOPS\" target=\"_blank\" rel=\"noopener\">FLOPs<\/a> used by <a href=\"https:\/\/www.infoq.com\/news\/2025\/02\/deepseek-r1-release\/\" target=\"_blank\" rel=\"noopener\">DeepSeek R1<\/a> for sequences of 100K tokens. The model was trained using large-scale reinforcement learning across a range of domains, including mathematical problem-solving and software engineering environments.<\/p>\n<p>Two versions of the model are available. The models are evaluated using a custom RL scaling approach. Notably, MiniMax introduces CISPO, a novel RL algorithm that clips importance sampling weights rather than token updates\u2014reportedly improving stability and performance over traditional variants.<\/p>\n<p>Across benchmarks, MiniMax-M1-80K consistently ranks at or near the top among open-weight models, with strong results in:<\/p>\n<ul>&#13;<\/p>\n<li>Long-context tasks (OpenAI-MRCR 128K: 73.4%, LongBench-v2: 61.5%)<\/li>\n<p>&#13;<\/p>\n<li>Software engineering (SWE-bench Verified: 56.0%)<\/li>\n<p>&#13;<\/p>\n<li>Tool use (TAU-bench airline: 62.0%, retail: 63.5%)<\/li>\n<p>&#13;<\/p>\n<li>Reasoning-heavy math benchmarks (AIME 2024: 86.0%)<\/li>\n<p>&#13;\n<\/ul>\n<p>One Reddit user <a href=\"https:\/\/www.reddit.com\/r\/LocalLLaMA\/comments\/1lcuglb\/comment\/my38g60\/?utm_source=share&amp;utm_medium=web3x&amp;utm_name=web3xcss&amp;utm_term=1&amp;utm_content=share_button\" target=\"_blank\" rel=\"noopener\">commented<\/a> on its standout capabilities:<\/p>\n<blockquote><p>&#13;<\/p>\n<p>This looks pretty great. Especially for function calling (Tau-bench) and long context, this seems like SOTA for open-weights. The latter by some big margin, which I don&#8217;t even find unbelievable because their old non-reasoning model was also great for this.<\/p>\n<p>&#13;\n<\/p><\/blockquote>\n<p>However, others pointed to limitations in practice. For example, dubesor86 <a href=\"https:\/\/www.reddit.com\/r\/LocalLLaMA\/comments\/1lcuglb\/comment\/my38g60\/?utm_source=share&amp;utm_medium=web3x&amp;utm_name=web3xcss&amp;utm_term=1&amp;utm_content=share_button\" target=\"_blank\" rel=\"noopener\">shared<\/a>:<\/p>\n<blockquote><p>&#13;<\/p>\n<p>It&#8217;s unusable, though. I had it play chess matches (usually takes a few minutes), and I had to have it run all night, and it still wasn&#8217;t done by the time I woke up. All the scores in the world mean nothing if the usability is zero.<\/p>\n<p>&#13;\n<\/p><\/blockquote>\n<p>MiniMax-M1 also supports structured function calling, making it suitable for agent frameworks. The model is available in two versions (40K and 80K) via <a href=\"https:\/\/huggingface.co\/MiniMaxAI\" target=\"_blank\" rel=\"noopener\">HuggingFace<\/a>. For deployment, the team recommends <a href=\"https:\/\/docs.vllm.ai\/en\/latest\/\" target=\"_blank\" rel=\"noopener\">vLLM<\/a>, offering optimized serving, memory management, and batching performance. Developers can also experiment via the <a href=\"https:\/\/github.com\/MiniMax-AI\/MiniMax-MCP\" target=\"_blank\" rel=\"noopener\">MiniMax MCP Server<\/a>, which bundles API access and capabilities such as video and image generation, speech synthesis, and voice cloning.<\/p>\n","protected":false},"excerpt":{"rendered":"MiniMax has introduced MiniMax-M1, an open-weight language model designed for long-context reasoning and tool use. Based on the&hellip;\n","protected":false},"author":3,"featured_media":13431,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[32],"tags":[691,648,1032,14103,1033,171,14105,14104,14101,14102,67,132,68],"class_list":{"0":"post-13430","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-arts-and-design","8":"tag-ai","9":"tag-arts","10":"tag-arts-and-design","11":"tag-benchmark","12":"tag-design","13":"tag-entertainment","14":"tag-hugging-face","15":"tag-large-language-models","16":"tag-minimax-m1","17":"tag-ml-data-engineering","18":"tag-united-states","19":"tag-unitedstates","20":"tag-us"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@us\/114743893318708772","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/13430","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/comments?post=13430"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/posts\/13430\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media\/13431"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/media?parent=13430"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/categories?post=13430"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/us\/wp-json\/wp\/v2\/tags?post=13430"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}