xAI provides GPU infrastructure to Cursor for AI model training

xAI is preparing to supply GPU infrastructure to Cursor for AI model training.The training will use tens of thousands of GPUs from xAI’s system.

Access to large-scale computing infrastructure is becoming more central to AI development, as companies allocate GPU capacity beyond internal use and into external model training.

Elon Musk’s AI company, xAI, is preparing to provide computing infrastructure to coding startup Cursor under a new arrangement, according to a report by Business Insider, citing people familiar with the matter. Cursor, which develops AI-powered coding tools, plans to train its upcoming model, Composer 2.5, using tens of thousands of GPUs drawn from xAI’s broader system.

GPU infrastructure for AI model training

That allocation comes from infrastructure that includes around 200,000 GPUs used for large-scale AI training workloads. Training at this scale typically requires thousands of GPUs operating in parallel over extended periods, with datasets reaching trillions of tokens and training cycles lasting several weeks, according to estimates from Stanford Human-Centered AI Institute and Epoch AI.

Such workloads are designed to run continuously across distributed systems, with compute resources processing large volumes of data simultaneously over extended durations.

Under the arrangement, xAI would provide dedicated GPU capacity for model training workloads. The setup also involves supplying computing infrastructure to an external user, reflecting a model commonly used by cloud providers and specialised GPU suppliers serving AI developers.

Large cloud providers such as Amazon Web Services, Microsoft Azure, and Google Cloud operate GPU fleets and rent computing resources to outside users. These platforms provide access to high-performance infrastructure without requiring companies to build their own systems.

Specialised providers, including CoreWeave and Lambda also supply GPU capacity tailored for AI workloads, supporting model training and fine-tuning, along with related development tasks.

Cursor is one of several companies building AI systems that depend on large-scale training infrastructure. It is currently in discussions for a valuation of around $50 billion, according to prior reporting. The company is developing coding tools in a market that also includes Anthropic and OpenAI, both of which are building systems designed to assist software engineering tasks.

In March, Cursor released Composer 2, a model designed to generate and edit code across large software projects. According to the company’s technical materials, the system supports multi-file code generation and editing, along with command execution within development environments.

The model is based on an open-source system developed by Moonshot AI and further trained using proprietary developer usage data collected through Cursor’s platform, according to its technical report.

The two companies have also had prior overlap through personnel moves. In March, xAI hired former Cursor product engineering leads Andrew Milich and Jason Ginsburg.

According to prior reporting by Business Insider, both now hold senior product roles at xAI and report to Elon Musk and xAI president Michael Nicolls.

xAI’s Colossus system

xAI’s compute capacity is built around Colossus, a large-scale supercomputer system designed for AI training. The company has said the system operates with around 200,000 Nvidia GPUs and plans to expand that capacity to 1 million units.

Colossus is located in Memphis and initially launched with around 100,000 GPUs before expanding to approximately 200,000. The system is designed to run parallel AI workloads across a dense GPU cluster, supporting training jobs that require sustained compute over extended periods.

The infrastructure relies on Nvidia GPUs commonly used in large-scale AI training, according to benchmarks from CoreWeave. Dell Technologies has supplied GPU-equipped servers for Colossus and is reportedly in advanced discussions to provide additional infrastructure, according to Bloomberg.

xAI has also made changes to the team overseeing that infrastructure. Infrastructure lead Heinrich Küttler has departed. Jake Palmer has taken over physical infrastructure, while SpaceX executive Daniel Dueri now oversees compute infrastructure.

Efficiency and utilisation

In an internal memo, Michael Nicolls said xAI’s model FLOPs utilisation rate, or MFU, stood at about 11%. MFU measures how much of a system’s theoretical compute capacity is actively used during training.

Nicolls set a target of 50%, compared with industry ranges of 35% to 45%, according to data from Lambda. Lower utilisation levels indicate that part of deployed compute capacity is not being fully used during training workloads.

Large-scale AI training systems rely on checkpointing mechanisms to recover from interruptions. Inefficiencies or restarts can reduce effective utilisation and extend training time.

The arrangement links xAI’s compute infrastructure with a coding model that requires sustained training capacity on large GPU clusters.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events, click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

xAI provides GPU infrastructure to Cursor for AI model training

Tags: