Microsoft’s agentic security system found four critical Windows RCE flaws

Microsoft responded to growing competition in AI security by announcing that its new agentic security system helped researchers discover 16 new vulnerabilities in the Windows networking and authentication stack, including four critical remote code execution (RCE) flaws.

Microsoft MDASH

MDASH architecture diagram (Source: Microsoft)

Two of the four flaws — CVE-2026-40361 and CVE-2026-40364 — were deemed by Microsoft to be more likely to be exploited.

The multi-model agentic scanning harness, codenamed MDASH, was built by Microsoft’s Autonomous Code Security team and uses more than 100 specialized AI agents and an ensemble of frontier and distilled models to discover, debate, and validate exploitable vulnerabilities end-to-end.

“AI vulnerability discovery has crossed from research curiosity into production-grade defense at enterprise scale, and the durable advantage lies in the agentic system around the model rather than any single model itself,” Taesoo Kim, VP, Agentic Security, Microsoft wrote in a blog post.

To evaluate MDASH, the company tested the system against a private Windows driver named StorageDrive that contained 21 intentionally injected vulnerabilities, including kernel use-after-frees (UAFs), integer handling issues, IOCTL validation gaps, and locking errors.

Because StorageDrive is a private codebase that had never been publicly released, Microsoft said the benchmark minimized the possibility that the AI models had previously seen the code during training. The company added that MDASH identified all 21 vulnerabilities without generating false positives.

“This simple test shows that the reasoning and vulnerability discovery capabilities of codename MDASH can approximate professional offensive researchers,” Kim noted.

The company also highlighted MDASH’s performance on internal and public vulnerability discovery benchmarks.

MDASH achieved a 96% recall rate against five years of confirmed Microsoft Security Response Center (MSRC) vulnerabilities in clfs.sys and a 100% recall rate in tcpip.sys, according to Microsoft.

The system also scored 88.45% on CyberGym, a public benchmark designed to evaluate AI systems on real-world vulnerability discovery tasks. The benchmark contains 1,507 vulnerabilities from OSS-Fuzz projects and measures how effectively AI systems can identify known security flaws in previously unseen codebases.

The result placed MDASH at the top of the CyberGym leaderboard, roughly five percentage points ahead of the next highest-ranked system, the company said.

“We are at a moment in the industry where AI-powered vulnerability discovery stops being speculative and starts being an engineering problem. The findings in this Patch Tuesday and the retrospective recall on five years of CLFS MSRC cases are evidence that AI vulnerability findings can scale,” Kim concluded.

Microsoft also noted that MDASH is currently being tested by customers as part of a limited private preview.

Microsoft’s agentic security system found four critical Windows RCE flaws

Tags: