Newswise — A study led by researchers from China Mobile published a perspective article in the “Computing and Network Convergence: Architecture, Theory and Practice” special issue of Frontiers of Information Technology & Electronic Engineering (FITEE) in 2024, Volume 25, Issue 5. They proposed the systematic architecture design of the Computing-Aware Network (CAN), introducing an awareness plane to collect, manage, and synthesize computing and network information. This aims to address issues such as slow computing service scheduling, inflexible data distribution, and low data transmission efficiency in wide-area networks (WANs).

CAN is defined as the integrated interconnection, joint awareness, and hybrid control of computing and network resources. Its architecture includes the awareness plane, control plane, and data plane. The awareness plane, a core functional module, collects, manages, and synthesizes computing and network information. These three planes collaborate to form a closed-loop control system. Compared with CFN-dyncast, CAN considers problems more comprehensively and has a more systematic architecture. Compared with the Computing Power Network (CPN), CAN has more specific technical and protocol designs.

The article presents three key technologies for the CAN system: Computing-Aware Traffic Steering (CATS), elastic broadcast, and wide-area high-throughput transmission. CATS is a computing service scheduling technology across multiple compute instances, selecting optimal paths by comprehensively analyzing computing capabilities and network status. Through distributed control and scheduling, it eliminates the additional delay overhead of querying the destination address of compute nodes, but needs to balance signaling overhead and the notification granularity of computing information. Elastic broadcast is designed to adapt to one-to-many collective communication patterns, used for artificial intelligence (AI) model training and inference across data centers. By extending the network controller and the Bit Index Explicit Replication (BIER) protocol, it realizes flexible one-to-many data transmission, saving bandwidth and reducing data copies on the end side. Wide-area high-throughput transmission is crucial for building high-performance data plane functions and expanding the applicability of CAN. Based on RoCEv2, it designs a transmission protocol to achieve end-to-end high-throughput data transmission through optimizations such as fast packet loss recovery, precise packet retransmission, and a congestion control algorithm based on one-way delay.

To solve the three main problems, preliminary simulations were conducted to demonstrate how to improve effective throughput. The simulation experiment of wide-area transmission was based on an FPGA-based network simulation prototype. The preliminary simulation of wide-area high-throughput transmission technology shows that this technology significantly outperforms standard TCP in throughput performance under different packet loss rates and round-trip times. The key technologies of CAN are suitable for optimizing AI services in WANs. Elastic broadcast optimizes model training, CATS is used for model inference, and wide-area high-throughput transmission is used for offline model deployment and parameter updates.

Finally, the article proposes that future research can be carried out in two directions: the energy efficiency of computing and network convergence, and the convergence of computing, networks, and applications. It aims to explore resource scheduling strategies to reduce energy consumption and study the collaborative design of computing, networks, and applications to achieve system optimization.

The paper “Computing-aware network (CAN): a systematic design of computing and network convergence” authored by Xiaoyun WANG, Xiaodong DUAN, Kehan YAO, Tao SUN, Peng LIU, Hongwei YANG and Zhiqiang LI. Full text of the open access paper: https://doi.org/10.1631/FITEE.2400098.