{"id":205674,"date":"2025-06-22T17:51:09","date_gmt":"2025-06-22T17:51:09","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/205674\/"},"modified":"2025-06-22T17:51:09","modified_gmt":"2025-06-22T17:51:09","slug":"ai-networking-cornelis-cn500-boosts-performance","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/205674\/","title":{"rendered":"AI Networking: Cornelis&#8217; CN500 Boosts Performance"},"content":{"rendered":"<p>In the good old days, networks were all about connecting a small number of local computers. But times have changed. In an AI-dominated world, the trick is coordinating the activity of tens of thousands of <a href=\"https:\/\/spectrum.ieee.org\/tag\/servers\" target=\"_blank\" rel=\"noopener\">servers<\/a> to train a large language model\u2014without any delay in communication. Now there\u2019s an architecture optimized to do just that. <a href=\"https:\/\/www.cornelisnetworks.com\/\" target=\"_blank\" rel=\"noopener\">Cornelis Networks<\/a> says its CN500 networking fabric maximizes <a href=\"https:\/\/spectrum.ieee.org\/nvidia-blackwell-mlperf-training-5\" target=\"_blank\" rel=\"noopener\">AI performance<\/a>, supporting deployments with up to 500,000 computers or processors\u2014an order of magnitude higher than today\u2014and no added latency.<\/p>\n<p>The new technology brings a third major product to the networking world, along with <a href=\"https:\/\/spectrum.ieee.org\/ethernet-ieee-milestone\" target=\"_blank\" rel=\"noopener\">Ethernet<\/a> and <a href=\"https:\/\/www.infinibandta.org\/\" target=\"_blank\" rel=\"noopener\">InfiniBand<\/a>. It\u2019s designed to enable AI- and <a href=\"https:\/\/spectrum.ieee.org\/tag\/high-performance-computers\" target=\"_blank\" rel=\"noopener\">high-performance computers<\/a> (HPC, or <a href=\"https:\/\/spectrum.ieee.org\/tag\/supercomputers\" target=\"_blank\" rel=\"noopener\">supercomputers<\/a>) to achieve faster and more predictable completion times with greater efficiency. For HPC, Cornelis claims its technology outperforms <a href=\"https:\/\/docs.nvidia.com\/dgx-superpod\/design-guide-cabling-data-centers\/latest\/ndr-overview.html\" target=\"_blank\" rel=\"noopener\">InfiniBand NDR<\/a>\u2014the version introduced in 2022\u2014 passing twice as many messages per second and with 35 percent less latency. For AI applications, it delivers six-fold faster communication compared to Ethernet-based protocols.<\/p>\n<p><a href=\"https:\/\/spectrum.ieee.org\/tag\/ethernet\" target=\"_blank\" rel=\"noopener\">Ethernet<\/a> has long been synonymous with local area networking, or LAN. Software patches have allowed its communication protocols to weather the test of time. The invention of InfiniBand was an improvement, but it was still designed with the same goal: connecting a small number of local devices. \u201cWhen these technologies were invented, they had nothing to do with parallel computing,\u201d says <a href=\"https:\/\/www.cornelisnetworks.com\/company\/leadership\" target=\"_blank\" rel=\"noopener\">Philip Murphy<\/a>, co-founder, president, and chief operation officer at Pennsylvania-based Cornelis.<\/p>\n<p>When <a href=\"https:\/\/spectrum.ieee.org\/tag\/data-centers\" target=\"_blank\" rel=\"noopener\">data centers<\/a> started to spring up, engineers needed a new networking solution. Because different systems used different software, they couldn\u2019t share resources\u2014so scaling the likes of Ethernet and InfiniBand to accommodate the busiest periods of operations proved challenging. \u201cThat sparked the whole cloud evolution,\u201d says Murphy. Sharing a cloud-based CPU among different computers or even different organizations became the solution du jour.<\/p>\n<p>But while data center pioneers tried to maximize the number of applications running on one server, Murphy and his colleagues saw value in an opposite approach: maximizing the number of <a href=\"https:\/\/spectrum.ieee.org\/tag\/processors\" target=\"_blank\" rel=\"noopener\">processors<\/a> working on one application. \u201cThat requires a totally different networking solution,\u201d he says, which is what Cornelis now offers. The company\u2019s <a href=\"https:\/\/www.cornelisnetworks.com\/products\/omni-path-100\" target=\"_blank\" rel=\"noopener\">Omni-Path<\/a> architecture, developed by <a href=\"https:\/\/spectrum.ieee.org\/tag\/intel\" target=\"_blank\" rel=\"noopener\">Intel<\/a> for <a href=\"https:\/\/spectrum.ieee.org\/tag\/supercomputing\" target=\"_blank\" rel=\"noopener\">supercomputing<\/a> applications like simulating climate models or molecular interactions for drug design, offers maximum throughput with zero data packet loss.<\/p>\n<p>Congestion-free data highway<\/p>\n<p>Coordinating processors to <a href=\"https:\/\/spectrum.ieee.org\/generative-ai-training\" target=\"_blank\" rel=\"noopener\">train AI models<\/a> requires the exchange of many messages\u2014data packets\u2014at very high bandwidth. The message rate per millisecond matters, and so does the latency, meaning how long a recipient takes to respond.<\/p>\n<p>One major challenge with sharing so many data packets throughout a network is <a href=\"https:\/\/spectrum.ieee.org\/tag\/traffic-congestion\" target=\"_blank\" rel=\"noopener\">traffic congestion<\/a>. Murphy explains, you need a way to reliably route packets around <a href=\"https:\/\/spectrum.ieee.org\/tag\/congestion\" target=\"_blank\" rel=\"noopener\">congestion<\/a> points without creating other problems. For example, if the packets take different routes to the same destination, they may arrive out of order.<\/p>\n<p>Cornelis\u2019s dynamic adaptive routing algorithm mitigates congestion by routing around short-lived congestion events, while its congestion-control architecture routes traffic around \u201cpopular\u201d destinations. \u201cIf there\u2019s an event at a stadium that we all want to go to, you don\u2019t want the traffic that\u2019s going past the stadium to get caught there too,\u201d says Murphy. The central pacing technique enables this congestion-control architecture. Switches see where traffic is forming, then tell senders to slow down until congestion dissipates. \u201cThink of mitigating traffic as it comes onto a highway on-ramp,\u201d Murphy explains.<\/p>\n<p>The other challenge is avoiding latency. In traditional Ethernet architectures, sending a packet requires sufficient memory at the end point. \u201cIf I send to you and you run out of memory, you have to come back and tell me that,\u201d says Murphy. That\u2019s a long loop that requires large buffers that are not scalable. Instead, Cornelis uses an algorithm called credit-based flow control that allocates memory in advance. \u201cYou don\u2019t have to tell me anything, and I\u2019ll know how much more I can send,\u201d says Murphy.<\/p>\n<p>Finally, the system avoids grinding to a halt if a GPU or link fails. In traditional architectures, if the server goes down, so does the application. Fixing it requires rebooting from the most recent checkpoint\u2014which itself took extensive computing power to create. \u201cImagine if every time you hit \u2018save\u2019 on your document, you had to wait 20 minutes,\u201d says Murphy. Instead, because it\u2019s spread across multiple computers, Cornelis Networks keeps an application running, albeit at slightly lower bandwidth until the faulty link can be replaced\u2014no checkpoints needed.<\/p>\n<p>Efficient AI<\/p>\n<p>Physically, the <a href=\"https:\/\/www.cornelisnetworks.com\/products\/cn5000\" target=\"_blank\" rel=\"noopener\">CN5000<\/a> product is a network card built around a custom chip. The network cards plug into every server, \u201clike you plug an Ethernet card into your PC at home,\u201d explains Murphy. A top-of-rack switch is cabled to each server and to other switches, and a director-class switch comes with 48 or 576 ports to link to the rack switches. \u201cEach server has cards plugged in, so you can build multi-thousand endpoint clusters,\u201d says Murphy.<\/p>\n<p>The company\u2019s main market is organizations that want to upgrade to a new cluster for AI or faster HPC simulations. That\u2019s done through one of three original equipment manufacturers Cornelis is working with that make the servers and network switches. The OEM purchases physical cards from Cornelis and plugs them into servers before fulfilling the order.<\/p>\n<p>Until recently, training a neural network model was a one-time deal. But now, training a multitrillion-parameter AI model means repeatedly fine-tuning or updating. Cornelis expects to take advantage of that. \u201cIf you don\u2019t adopt AI, you\u2019re going out of business. If you use AI inefficiently, you\u2019ll still go out of business,\u201d Murphy says. \u201cOur customers want to adopt AI in the most efficient way possible.\u201d<\/p>\n<p>From Your Site Articles<\/p>\n<p>Related Articles Around the Web<\/p>\n","protected":false},"excerpt":{"rendered":"In the good old days, networks were all about connecting a small number of local computers. But times&hellip;\n","protected":false},"author":2,"featured_media":205675,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3164],"tags":[3284,32054,82830,8254,82829,44109,53,16,15],"class_list":{"0":"post-205674","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-computing","8":"tag-computing","9":"tag-data-centers","10":"tag-ethernet","11":"tag-large-language-models","12":"tag-local-area-network","13":"tag-scale","14":"tag-technology","15":"tag-uk","16":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114728290457964221","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/205674","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=205674"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/205674\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/205675"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=205674"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=205674"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=205674"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}