Microsoft Has Reportedly Developed “Toolkits” to Break NVIDIA’s CUDA Dominance, Slashing Inference Costs with AMD AI GPUs

Microsoft is exploring ways to leverage the ‘stack’ of its AMD GPUs for inferencing workloads, as the company develops toolkits that convert NVIDIA CUDA models into ROCm-supported code.

Microsoft Sees Massive Demand For Inference Over Training, Which Makes AMD’s AI Chips a Lot More Attractive

One of the reasons NVIDIA has managed to retain its dominance in the AI space is that the firm has a ‘CUDA lock-in’ mechanism in place, which essentially forces CSPs and AI giants to employ NVIDIA’s AI chips to achieve optimal results with NVIDIA’s CUDA software ecosystem. Efforts have been made in the past to break this barrier and allow cross-platform support, but we haven’t seen a solution that has become mainstream. However, according to a ‘high-ranking’ Microsoft employee, it is reported that the tech giant has developed ‘toolkits’ that allow the firm to run CUDA code on AMD GPUs by translating it into a ROCm-compatible version.

A MUST-read interview with a high-ranking $MSFT employee on data centers and what is happening right now ( $NVDA/ $AMD, liquid cooling, and HHD):

1. The challenges that $MSFT is having right now are energy and liquid cooling. To improve its goodwill with municipalities, $MSFT is… pic.twitter.com/jQTfhnxQga

— Rihard Jarc (@RihardJarc) November 7, 2025

Breaking CUDA’s dominance isn’t an easy task, as the software ecosystem is so integral to the AI industry that its adoption is almost ubiquitous, even in nations like China. However, Microsoft’s toolkit, mentioned by the employee, likely employs a route that has been in the market for quite some time. One way to perform a CUDA-to-ROCm translation is through a runtime compatibility layer, which enables CUDA API calls to be translated into ROCm without requiring full source code rewrites. One example of this is the ZLUDA tool, which intercepts CUDA calls, translates them into ROCm, and does so without requiring a full recompile.

We built some toolkits to help convert like CUDA models to ROCm so you could use it on an AMD, like a 300X. We have had a lot of inquiries about what is our path with AMD, the 400X and the 450X. We’re actually working with AMD on that to see what we can do to maximize that.

NVIDIA CUDA Can Now Directly Run On AMD's RDNA GPUs Using The "SCALE" Toolkit 1

However, due to ROCm still being a relatively ‘immature’ software stack, there are several API calls or pieces of code in CUDA that have no mapping with AMD’s software, which, in some cases, collapses the performance, which is a high-risk problem in large datacenter environments. Another possible variant of the toolkit being mentioned here is likely an end-to-end cloud migration tool that integrates with Azure, targeting both AMD and NVIDIA instances. Of course, this does bring problems when conversions happen on a large scale, but by the looks of it, the toolkits developed by Microsoft appear to be in confined use.

Now, the reason why Microsoft is pursuing the ‘software conversions’ here is simply because the firm is seeing an increase in inference workloads, and the company is looking for a more cost-effective workload, which is why AMD’s AI chips make sense here, since they are the only counterpart to the ‘pricey’ NVIDIA GPUs. And since you cannot leave out CUDA models across inference environments, the translation from it to ROCm becomes the next big step for Microsoft.

Follow Wccftech on Google or add us as a preferred source, to get our news coverage and reviews in your feeds.

Microsoft Has Reportedly Developed “Toolkits” to Break NVIDIA’s CUDA Dominance, Slashing Inference Costs with AMD AI GPUs

Tags: