To meet the high security requirements for user data in foundation model training, China Mobile Research Institute recently proposed a new remote storage and computing architecture based on Hitless Intelligent Computing-OTN (HIC-OTN).


Working with China Mobile Hubei Branch and Huawei, CMRI has completed the industry’s first technical test of remote storage and computing over 240 km using HIC-OTN in China Mobile’s intelligent computing center (Wuhan). This initiative establishes a new standard for secure foundation model training with localized user data. The foundation model, utilizing pipeline parallelism (PP) and with hundreds of billions of parameters, achieved a training efficiency of over 99% of that in a single cluster, on the 240 km intelligent computing interconnection network. This achievement marks a significant milestone in the advancement of intelligent computing center technologies and service applications.

The leapfrog development of foundation model technologies is driving a surge in demand for intelligent transformation and upgrades across various industries. The computing power needed for training foundation models is increasing; major tech companies, both in and outside of China, are investing in clusters with 10,000+ or even 100,000+ cards. This involves high construction costs and requires addressing technical challenges to enhance the efficiency of large-scale computing power utilization. Small and medium-sized enterprises face challenges in constructing large intelligent computing centers due to high costs and technical requirements. Renting intelligent computing services can pose security risks when transferring private data to external centers for foundation model training. This creates a significant gap between the pressing need for AI capabilities and the scale of application.

China Mobile Research Institute introduced the cutting-edge HIC-OTN-based remote storage and computing technology architecture. “Micro-computing power” is utilized on the user side for data training, with the training process segmented and user data remaining stored locally. Only the intermediate values of model training are sent through HIC-OTN to the service provider’s intelligent computing center (powerful computing power) for training, ensuring cost-effectiveness and high security in foundation model training. In order to ensure reliable remote storage and computing on the transmission network, the HIC-OTN hitless transmission mechanism is utilized. This innovative technology reconstructs the forwarding and storage functions of devices, resulting in improved performance. Traditional OTN protection switching typically causes a 50 ms service interruption with packet losses, but with HIC-OTN, there is a 0 ms service interruption without zero packet loss. In the industry’s first technical test of HIC-OTN-based remote storage and computing (240 km), 16 GPUs were deployed on the user side as the entry for training foundation models with hundreds of billions of parameters based on PP, and 48 GPUs were deployed in the carrier’s intelligent computing center for centralized and large-scale training. The two ends 240 km apart were interconnected through 800G HIC-OTN featuring high bandwidth and hitless transmission, achieving a collaborative training efficiency of over 99% of that in a single cluster.

Duan Xiaodong, Vice President of China Mobile Research Institute, described the original HIC-OTN-based remote storage and computing architecture as a new approach to foundation model training for small- and medium-sized enterprises. This innovation is expected to establish a new standard in inclusive intelligent computing technology and applications. The test leverages the ultra-high bandwidth, ultra-low latency, and ultra-high reliability of the new HIC-OTN technology to achieve a training efficiency of over 99% of that in a single cluster. This enables efficient collaboration between the “micro-computing power” on the user side and the “powerful computing power” on the service provider side.

China Mobile is dedicated to driving AI-powered innovation across various industries, consistently advancing original technology innovation and development. This test successfully demonstrated the feasibility and progress of the HIC-OTN-based remote storage and computing technology architecture. Moving forward, we aim to further enhance collaboration between academia, industry, and research in intelligent computing and optical interconnection technologies to create advanced networks that can facilitate the rapid growth of AI.

MORE ARTICLES YOU MAY BE INTERESTED IN…