{"id":78562,"date":"2025-09-22T09:49:18","date_gmt":"2025-09-22T09:49:18","guid":{"rendered":"https:\/\/www.europesays.com\/ie\/78562\/"},"modified":"2025-09-22T09:49:18","modified_gmt":"2025-09-22T09:49:18","slug":"googles-most-expensive-traitor-and-transformer-author-unveil-next-step-for-agi-after-return-at-2-7-billion-sky","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/ie\/78562\/","title":{"rendered":"Google&#8217;s Most Expensive &#8220;Traitor&#8221; and Transformer Author Unveil Next Step for AGI After Return at $2.7 Billion Sky"},"content":{"rendered":"<p><strong>In the AI boom, what is the most &#8220;coveted&#8221; thing for large models? Is it computing power, storage, or complex network interconnection? At Hot Chips 2025, Noam Shazeer, one of the inventors of Transformer and the co-leader of Google&#8217;s Gemini, gave the answer.<\/strong><\/p>\n<p>See through the global large models at a glance! A grand contribution on the tenth anniversary of AI Probe, the 37 &#8211; page 2025 ASI Frontier Trends Report is released for the first time.<\/p>\n<p>What do large models need?<\/p>\n<p>At the keynote speech on the first day of the technology event Hot Chips 2025 held in Silicon Valley, Noam Shazeer from Google DeepMind answered this question and delivered a keynote speech titled &#8220;Predictions for the Next Phase of AI&#8221;.<\/p>\n<p class=\"image-wrapper\"><img decoding=\"async\" data-img-size-val=\"686,386\" src=\"https:\/\/www.europesays.com\/ie\/wp-content\/uploads\/2025\/09\/1758534554_385_interlace,1.jpeg\"\/><\/p>\n<p>Besides being one of the authors of the Transformer paper &#8220;Attention Is All You Need&#8221;, he has also promoted many technological innovations, such as significantly improving the spelling correction function in Google Search.<\/p>\n<p>As early as 2017, he invented the Transformer model and has been deeply involved in the LLM field for ten years.<\/p>\n<p>Later, he developed a chatbot, but Google refused to release the result, which prompted him to leave and found Character.AI.<\/p>\n<p>Soon after, Google realized its own shortcomings and finally reached a cooperation with Character.AI at a high price of $2.7 billion.<\/p>\n<p>Now, Noam has returned to Google and serves as the co &#8211; leader of the Gemini project.<\/p>\n<p>As he showed, <strong>large language models can continuously improve their performance and accuracy with the improvement of various resources such as hardware.<\/strong><\/p>\n<p class=\"image-wrapper\"><img decoding=\"async\" data-img-size-val=\"1000,750\" src=\"https:\/\/www.europesays.com\/ie\/wp-content\/uploads\/2025\/09\/1758534554_303_interlace,1.jpeg\"\/><\/p>\n<p><strong>In the next phase of AI, it&#8217;s all about computing power, computing power, and more computing power<\/strong><\/p>\n<p>Noam Shazeer mainly shared the requirements of LLM, his personal research path in LLM, and the relationship between hardware and LLM.<\/p>\n<p class=\"image-wrapper\"><img decoding=\"async\" data-img-size-val=\"1080,553\" src=\"https:\/\/www.europesays.com\/ie\/wp-content\/uploads\/2025\/09\/1758534554_962_interlace,1.jpeg\"\/><\/p>\n<p>He emphasized several key points.<\/p>\n<p><strong>First of all, Noam believes that language modeling is the most important research field at present.<\/strong><\/p>\n<p>He dedicated a slide in his speech to explain this point, which shows his high enthusiasm for this topic.<\/p>\n<p class=\"image-wrapper\"><img decoding=\"async\" data-img-size-val=\"1080,664\" src=\"https:\/\/www.europesays.com\/ie\/wp-content\/uploads\/2025\/09\/1758534555_12_interlace,1.jpeg\"\/><\/p>\n<p><strong>Then he talked about &#8220;What LLMs want&#8221;.<\/strong><\/p>\n<p>He is more concerned about the fact that more FLOPS means better performance.<\/p>\n<p class=\"image-wrapper\"><img decoding=\"async\" data-img-size-val=\"800,412\" src=\"https:\/\/www.europesays.com\/ie\/wp-content\/uploads\/2025\/09\/1758534555_131_interlace,1.jpeg\"\/><\/p>\n<p>This is very important because as the number of parameters, depth, non &#8211; linearity, and information flow increase, the scale of LLM also increases.<\/p>\n<p>This may require more computing resources. More high &#8211; quality training data also helps to create better LLMs.<\/p>\n<p>He also mentioned that in 2015, training on 32 GPUs was a big deal; but ten years later, hundreds of thousands of GPUs may be needed.<\/p>\n<p>Another interesting detail is that he said that in 2018, Google built computing nodes for AI.<\/p>\n<p>This was a big deal because before that, Google engineers usually ran workloads on a thousand CPUs. But then they would slow down and be used for other purposes, such as web crawling.<\/p>\n<p>Having large machines dedicated to deep learning\/AI workloads has led to a huge improvement in performance.<\/p>\n<p><strong>Next is a highlight of the chip conference, that is, the hardware requirements of LLM.<\/strong><\/p>\n<p class=\"image-wrapper\"><img decoding=\"async\" data-img-size-val=\"894,494\" src=\"https:\/\/www.europesays.com\/ie\/wp-content\/uploads\/2025\/09\/1758534555_243_interlace,1.jpeg\"\/><\/p>\n<p class=\"img-desc\">An interesting view can be seen from this slide<\/p>\n<p>More computing power, memory capacity, memory bandwidth, and more network bandwidth are all crucial for promoting the progress of future AI models.<\/p>\n<p>At &#8220;all levels&#8221;, this is not only about the capacity and bandwidth of DDR5, but also includes HBM and on &#8211; chip SRAM.<\/p>\n<p><strong>Reducing precision<\/strong> to make better use of these four aspects is also considered a good thing in many cases.<\/p>\n<p><strong>Determinism helps with better programming.<\/strong><\/p>\n<p>The message of the speech boils down to: Having larger and faster devices in the cluster will lead to gains in LLM.<\/p>\n<p>This may be good news for Google and some other companies.<\/p>\n<p class=\"image-wrapper\"><img decoding=\"async\" data-img-size-val=\"1080,602\" src=\"https:\/\/www.europesays.com\/ie\/wp-content\/uploads\/2025\/09\/1758534556_589_interlace,1.jpeg\"\/><\/p>\n<p><strong>What kind of hardware do large models need?<\/strong><\/p>\n<p>Noam is a typical &#8220;reverse cross &#8211; over person&#8221;: as an AI researcher, he is very curious about hardware and always wants to know how these machines work.<\/p>\n<p>In the Mesh &#8211; TensorFlow project, he became very interested in the underlying network structure of TPU.<\/p>\n<p class=\"image-wrapper\"><img decoding=\"async\" data-img-size-val=\"1080,473\" src=\"https:\/\/www.europesays.com\/ie\/wp-content\/uploads\/2025\/09\/1758534556_206_interlace,1.jpeg\"\/><\/p>\n<p class=\"img-desc\">Paper link: https:\/\/arxiv.org\/abs\/1811.02084<\/p>\n<p>He asked many refreshing questions:<\/p>\n<p>Is your chip actually a ring network structure?<\/p>\n<p>How do data packets run in it?<\/p>\n<p>How is it mapped to the tensor calculation of the neural network?<\/p>\n<p>This curiosity ultimately led to many breakthroughs in Google&#8217;s collaborative design of software and hardware.<\/p>\n<p>In this speech, Noam Shazeer deeply analyzed what kind of hardware LLM actually needs.<\/p>\n<p><strong>The hardware support needed for AI: It&#8217;s not just GPUs<\/strong><\/p>\n<p>There is no doubt that computing power is the most needed factor for LLM.<\/p>\n<p>When people say &#8220;What do LLMs want&#8221;, they are actually asking:<\/p>\n<p>How does our hardware system need to change to make AI smarter?<\/p>\n<p>Noam&#8217;s answer is clear and direct: <strong>The more, the better; the bigger, the better<\/strong>.<\/p>\n<p class=\"image-wrapper\"><img decoding=\"async\" data-img-size-val=\"1080,424\" src=\"https:\/\/www.europesays.com\/ie\/wp-content\/uploads\/2025\/09\/1758534556_319_interlace,1.jpeg\"\/><\/p>\n<p><strong>1. More FLOPs<\/strong><\/p>\n<p>The more computing power, the better. It&#8217;s best to have petaflops of floating &#8211; point operation capabilities. It directly determines how large a model you can train, how large a batch you can use, and how much training data you can cover.<\/p>\n<p><strong>2. Larger memory capacity &amp; higher memory bandwidth<\/strong><\/p>\n<p>Noam pointed out that insufficient memory bandwidth will limit the flexibility of the model structure. For example, you can&#8217;t easily add non &#8211; linear layers. And higher bandwidth means more fine &#8211; grained control.<\/p>\n<p class=\"image-wrapper\"><img decoding=\"async\" data-img-size-val=\"1080,421\" src=\"https:\/\/www.europesays.com\/ie\/wp-content\/uploads\/2025\/09\/1758534557_782_interlace,1.jpeg\"\/><\/p>\n<p class=\"img-desc\">Memory includes: on &#8211; chip SRAM, high &#8211; bandwidth memory (HBM), video memory, or other medium &#8211; high &#8211; speed caches such as DRAM<\/p>\n<p>In terms of <strong>memory capacity<\/strong>, it directly determines:<\/p>\n<p>How large a model can be accommodated;<\/p>\n<p>How many intermediate states can be retained during inference (such as long context, cache, attention heads, etc.).<\/p>\n<p><strong>3. Network Bandwidth<\/strong><\/p>\n<p>This is an important factor that many people ignore.<\/p>\n<p>Because whether it is training or inference, LLM almost always <strong>: <\/strong>the model is distributed across multiple chips, and data is transferred back and forth between them.<\/p>\n<p>For example, everyone is now pursuing a &#8220;long chain of thought&#8221;, which means that the model needs to &#8220;think&#8221; for a longer time to get a stronger answer.<\/p>\n<p>But this also means <strong>each step of inference needs to be completed faster<\/strong>, otherwise the response will slow down.<\/p>\n<p>At this time, <strong>the bottleneck often lies in whether you can quickly access all the model parameters<\/strong> &#8211; not just the copy on the chip, but all parts distributed in an entire computing grid.<\/p>\n<p>Therefore, Noam summarized:<\/p>\n<p>If you want fast inference, the core question is &#8211; how much memory bandwidth can your group of chips provide in total?<\/p>\n<p><strong>With the rapid development of AI, where is the way forward for humanity?<\/strong><\/p>\n<p>He added some other &#8220;wish lists&#8221; for hardware design.<\/p>\n<p><strong>1. Low Precision<\/strong><\/p>\n<p>In traditional scientific computing, precision is crucial.<\/p>\n<p>But in LLM, the model itself has a certain degree of &#8220;fuzziness&#8221;, and low &#8211; bit precision often doesn&#8217;t have much of an impact.<\/p>\n<p>Therefore, <strong>low &#8211; precision computing power<\/strong> is completely reasonable. It&#8217;s worth using 8 &#8211; bit or even 4 &#8211; bit to get more FLOPs.<\/p>\n<p>The industry is indeed trying lower and lower precision formats (FP8, INT4, binary, etc.) &#8211; as long as convergence can be maintained, the lower the better.<\/p>\n<p><strong>Of course, reproducibility cannot be sacrificed.<\/strong><\/p>\n<p>The core challenge is &#8220;sufficient precision during training&#8221; and &#8220;small enough error during inference&#8221;.<\/p>\n<p><strong>2. Determinism<\/strong><\/p>\n<p>Noam believes this is the key because <strong>the failure rate of machine &#8211; learning experiments is already very high.<\/strong><\/p>\n<p>Many times, you don&#8217;t know whether a result fails because of the wrong model structure, problems with the data, or a bug in your code.<\/p>\n<p>If different results are obtained every time you train, you can&#8217;t even start &#8220;debugging&#8221;.<\/p>\n<p>He recalled that in the early days of asynchronous training at Google Brain, there were often situations where &#8220;it worked this time, but failed the next time&#8221;, and the engineering experience was extremely poor.<\/p>\n<p>So, his advice to hardware designers is:<\/p>\n<p>Unless you can give me 10 times the performance, don&#8217;t sacrifice reproducibility.<\/p>\n<p><strong>3. Problems of arithmetic overflow and precision loss<\/strong><\/p>\n<p>A live audience member asked: How to deal with the overflow or instability that often occurs in low &#8211; precision arithmetic?<\/p>\n<p>Noam answered:<\/p>\n<p>Ensure that the accumulator uses higher precision;<\/p>\n<p>Or perform clipping to prevent the values from exploding; <\/p>\n<p>The worst option is &#8220;wrap around&#8221;.<\/p>\n<p>The host Cliff added a witty remark:<\/p>\n<p>What we want is that after loading the checkpoint, the machine should crash in the same way &#8211; this is the real reproducibility.<\/p>\n<p>A tricky question raised by a Waymo engineer <strong>: If hardware no longer progresses from today on, can we still develop Artificial General Intelligence (AGI)?<\/strong><\/p>\n<p><strong>Noam gave a surprising but firm answer: Yes.<\/strong><\/p>\n<p>He pointed out that <strong>AI will accelerate its own development<\/strong>, promoting the continuous evolution of software and system design. Even if the hardware remains the same, we can still make progress through software &#8211; level innovations.<\/p>\n<p>Of course &#8211; he changed the subject: <strong>But if you can continue to develop better hardware, it will be even better.<\/strong><\/p>\n<p>If AGI really arrives, where should humanity go?<\/p>\n<p><strong>Will AI save or end humanity?<\/strong><\/p>\n<p>Driven by computing power and data, AI is constantly advancing into more complex fields.<\/p>\n<p><strong>&#8220;As long as enough data and computing power are provided, it is possible to learn and reveal the internal structure of the universe.&#8221;<\/strong><\/p>\n<p>In a recent interview, Mustafa Suleyman, the CEO of Microsoft AI, said so.<\/p>\n<p>He pointed out that the current LLM (Large Language Model) is still just a &#8220;single &#8211; step prediction engine&#8221; and is still in the early stage of AI development.<\/p>\n<p>But with the addition of persistent memory and long &#8211; term prediction capabilities, LLM is expected to develop into an &#8220;action &#8211; type AI&#8221; with complete planning capabilities:<\/p>\n<p>It can not only formulate complex plans like humans but also continuously execute tasks.<\/p>\n<p><strong>This leap may be achieved by the end of 2026.<\/strong><\/p>\n<p>Suleyman used the word &#8220;breathtaking&#8221; to describe this future and emphasized that we are just at the beginning, and everything will soon change profoundly.<\/p>\n<p class=\"image-wrapper\"><img decoding=\"async\" data-img-size-val=\"686,386\" src=\"https:\/\/img.36krcdn.com\/hsossms\/20250922\/v2_fbc9daf68a2944ddb2ac9a3464f3e69e@5888275_oswg426&lt;\/div&gt;&lt;\/div&gt;&lt;div class=\" common-width=\"\"\/><\/p>\n<p>\u8be5\u6587\u89c2\u70b9\u4ec5\u4ee3\u8868\u4f5c\u8005\u672c\u4eba\uff0c36\u6c2a\u5e73\u53f0\u4ec5\u63d0\u4f9b\u4fe1\u606f\u5b58\u50a8\u7a7a\u95f4\u670d\u52a1\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"In the AI boom, what is the most &#8220;coveted&#8221; thing for large models? Is it computing power, storage,&hellip;\n","protected":false},"author":2,"featured_media":78563,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[262],"tags":[52960,52971,314,2078,52970,18,46778,13167,52964,21093,19,17,52965,52961,14338,52969,52967,52966,52968,52962,82,52963],"class_list":{"0":"post-78562","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-computing","8":"tag-ai-boom","9":"tag-artificial-general-intelligence-agi","10":"tag-computing","11":"tag-computing-power","12":"tag-determinism","13":"tag-eire","14":"tag-flops","15":"tag-gemini","16":"tag-hardware-requirements","17":"tag-hot-chips-2025","18":"tag-ie","19":"tag-ireland","20":"tag-language-modeling","21":"tag-large-models","22":"tag-llm","23":"tag-low-precision","24":"tag-memory-bandwidth","25":"tag-memory-capacity","26":"tag-network-bandwidth","27":"tag-noam-shazeer","28":"tag-technology","29":"tag-transformer"},"share_on_mastodon":{"url":"","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts\/78562","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/comments?post=78562"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts\/78562\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/media\/78563"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/media?parent=78562"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/categories?post=78562"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/tags?post=78562"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}