{"id":275199,"date":"2025-07-19T16:50:11","date_gmt":"2025-07-19T16:50:11","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/275199\/"},"modified":"2025-07-19T16:50:11","modified_gmt":"2025-07-19T16:50:11","slug":"china-proves-open-models-more-effective-than-gpu-dominance-the-register","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/275199\/","title":{"rendered":"China proves open models more effective than GPU dominance \u2022 The Register"},"content":{"rendered":"<p>Comment OpenAI was supposed to make good on its name and release its first open-weights model since GPT-2 this week.<\/p>\n<p>Unfortunately, what could have been the US&#8217;s first half-decent open model of the year has been held up by a safety review, according to CEO Sam Altman. &#8220;While we trust the community will build great things with this model, once weights are out, they can&#8217;t be pulled back. This is new for us, and we want to get it right,&#8221; he wrote in a <a target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/x.com\/sama\/status\/1943837550369812814\">post<\/a> on X.<\/p>\n<p>The delay leaves the US in a rather awkward spot. Despite hundreds of billions of investment in GPUs, the best open model America has managed so far this year is Meta&#8217;s Llama 4, which enjoyed a less than stellar reception and was marred with controversy. Just this week, it was <a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/www.nytimes.com\/2025\/07\/14\/technology\/meta-superintelligence-lab-ai.html\">reported<\/a> that Meta had apparently taken its two-trillion-parameter Behemoth out behind the barn after it failed to live up to expectations.<\/p>\n<p>There have been a handful of other open model releases from US companies. Microsoft rolled out a version of Phi-4 14B, which was trained using reinforcement learning to enable reasoning functionality; IBM has released a handful of tiny LLMs focused on agentic workloads; and Google released its multimodal Gemma3 family, which topped out at 27 billion parameters. But these models are small fry compared to Meta&#8217;s 400-billion-parameter Llama 4 Maverick.<\/p>\n<p>As it stands among US companies, much of the real progress in generative AI development this year has been locked away, accessible only through API calls to someone else&#8217;s servers.<\/p>\n<p>China continues its AI hot streak<\/p>\n<p>But while US model builders continue to do their best work behind closed doors, China is doing it in the open. As Nvidia&#8217;s CEO likes to point out, half of the world&#8217;s AI researchers call China home, and it really shows.<\/p>\n<p>In early 2025, DeepSeek, up to that point a relatively obscure AI dev spun out of Chinese quantitative hedge fund High Flyer, became a household name following the <a target=\"_blank\" href=\"https:\/\/www.theregister.com\/2025\/01\/26\/deepseek_r1_ai_cot\/\" rel=\"noopener\">release<\/a> of its R1 model.<\/p>\n<p>The 671-billion-parameter LLM featured a novel mixture-of-experts (MoE) architecture that allowed it to run far faster and on fewer resources than even smaller LLMs like Llama 3.1 405B while replicating the reasoning functionality of OpenAI&#8217;s still fresh o1 model.<\/p>\n<p>More importantly, the model weights were released in the open, alongside detailed technical docs showing how they&#8217;d done it. And in what should have come as a surprise to no one, it was just a matter of weeks before we began to see Western devs replicate these processes to imbue their own models with reasoning capabilities.<\/p>\n<p>Since then, Alibaba has <a target=\"_blank\" href=\"https:\/\/www.theregister.com\/2025\/03\/16\/qwq_hands_on_review\/\" rel=\"noopener\">rolled out<\/a> a slew of new reasoning and MoE models including QwQ, Qwen3-235B-A22B, and 30B-A3B.<\/p>\n<p>In June, Shanghai-based MiniMax <a target=\"_blank\" href=\"https:\/\/www.theregister.com\/2025\/06\/17\/minimax_m1_model_chinese_llm\/\" rel=\"noopener\">released<\/a> its 456-billion-parameter reasoning model called M1 under a permissive Apache 2.0 software license. Notable features included a fairly large one-million-token context window and a new attention mechanism the dev claims helps it keep track of all those tokens.<\/p>\n<p>That same month, Baidu <a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/yiyan.baidu.com\/blog\/posts\/ernie4.5\/\">open sourced<\/a> its Ernie family of MoE models, which range in size from 47 billion parameters to 424 billion. Huawei has also open sourced its Pangu models trained on its in-house accelerators, but that release was almost immediately overshadowed by <a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/www.reuters.com\/business\/media-telecom\/huaweis-ai-lab-denies-that-one-its-pangu-models-copied-alibabas-qwen-2025-07-07\/\">allegations<\/a> of fraud.<\/p>\n<p>That brings us to July, when Moonshot AI, another Chinese AI dev, <a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/moonshotai.github.io\/Kimi-K2\/\">lifted<\/a> the curtain on Kimi 2, a one-trillion-parameter MoE model they claim bests even the West&#8217;s most potent proprietary LLMs. Take those claims with a grain of salt, but the fact remains, the Chinese have developed a one-trillion-parameter open-weights model. The only US LLMs to come close today are all proprietary.<\/p>\n<p>All of this, it should be remembered, was done in spite of Uncle Sam&#8217;s crusade to deprive the Chinese of the tools necessary to effectively compete in the AI arena.<\/p>\n<p>The year ain&#8217;t over yet<\/p>\n<p>This brings us back to OpenAI&#8217;s promised open-weights model. Not much is known about it other than what AI hype-man Altman has shared on X, and in public interviews and Congressional hearings.<\/p>\n<p>Altman kicked the whole thing off in February when he <a target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/x.com\/sama\/status\/1891667332105109653\">asked<\/a> his followers which they&#8217;d prefer OpenAI&#8217;s next open source project to be: an o3-mini-level model that&#8217;d run on GPUs or the best smartphone LLM they could muster. The o3-mini-level LLM won out.<\/p>\n<p>Then in June, OpenAI <a target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/x.com\/sama\/status\/1932573231199707168\">pushed back<\/a> the model&#8217;s release for the first time, with Altman posting that the &#8220;research team did something unexpected and quite amazing, and we think it will be very very worth the wait but needs a bit longer.&#8221;<\/p>\n<p>Say what you will about Altman&#8217;s penchant for hyperbole, but the fact remains that OpenAI has historically led on model development. Regardless of whether it&#8217;ll live up to the hype, any new competition in the open model arena is welcome, particularly among US players.<\/p>\n<p>Unfortunately, just as OpenAI prepares to release its first open model in six years, it&#8217;s <a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/www.nytimes.com\/2025\/07\/14\/technology\/meta-superintelligence-lab-ai.html\">reported<\/a> that Meta, under the direction of its pricey new superintelligence lab, may abandon its own commitment to open source in favor of a closed model.<\/p>\n<p>xAI, by all appearances, seems to have already gone down this route with its Grok family of LLMs. Originally, the Elon Musk-backed startup planned to open source the weights of its last model when a new version was released. And while xAI did release Grok-1 upon Grok-2&#8217;s debut, Grok-3 has been out since February, and its Hugging Face <a target=\"_blank\" rel=\"nofollow noopener\" href=\"https:\/\/huggingface.co\/xai-org\">page<\/a> is looking a little lonely.<\/p>\n<p>Then again, who is going to want a model whose hobbies include cosplaying as Mecha-Hitler? Perhaps, in this rare instance, this is one best left closed. \u00ae<\/p>\n","protected":false},"excerpt":{"rendered":"Comment OpenAI was supposed to make good on its name and release its first open-weights model since GPT-2&hellip;\n","protected":false},"author":2,"featured_media":275200,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3163],"tags":[323,1942,53,16,15],"class_list":{"0":"post-275199","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-technology","11":"tag-uk","12":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114880932923745798","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/275199","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=275199"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/275199\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/275200"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=275199"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=275199"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=275199"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}