Microsoft Debuts Bug Hunting 100-Agent AI System

Computing Giant Touts Multi-Agentic ‘MDASH’ Approach as Superior to Single Models

David Perera (@daveperera) ,
Greg Sirico •
May 15, 2026

Image: Samuel Boivin/Shutterstock

Microsoft says its new approach to finding vulnerabilities with artificial intelligence outclasses the single models touted by Anthropic and OpenAI.

The computing giant in a Tuesday blog post said it orchestrated more than 100 specialized AI agents “across an ensemble of frontier and distilled models” to discover 16 new vulnerabilities in the Windows networking and authentication stack.

The company refers to the “multi-model agentic scanning harness” system as MDASH.

“The strategic implication is clear: AI vulnerability discovery has crossed from research curiosity into production-grade defense at enterprise scale, and the durable advantage lies in the agentic system around the model rather than any single model itself,” wrote Taesoo Kim, vice president of security research at Microsoft.

Of the 16 vulnerabilities found, four are “critical remote code execution flaws in components such as the Windows kernel TCP/IP stack” and the IKEv2 key management protocol, the company reported. Microsoft patched the flaws as part of its most recent monthly dump of software fixes. AI is accelerating “the scale and speed of vulnerability discovery,” wrote Tom Gallagher, who leads Microsoft’s Microsoft Security Response Center, in a note accompanying May’s Patch Tuesday publication.

Microsoft’s agentic approach contrasts with Anthropic and OpenAI, which have touted the bug-finding properties of their individual Mythos and GPT 5.5 models, respectively. MDASH scored an 88.4% success rate on the University of California-Berkeley developed CyberGym benchmark, a method for testing AI abilities on actual vulnerabilities from production software. Mythos currently scores 83.1% and GPT 5.5 scores 81.8%. The scores are based on self-reporting from companies.

Microsoft didn’t disclose what models it used nor who made them. It famously has had a close relationship with OpenAI, integrated GPT models across its products. But that relationship has frayed and Microsoft has pressed development of its own proprietary models, announcing in April three new “MAI” models, MAI-Transcribe-1, MAI-Voice-1 and MAI-Image-2.

Kim touted the agentic approach as superior since “no single model is best at every stage.” The agents fulfilled different roles such as “auditor,” “debater” and “prover.”

“We don’t expect one prompt to do everything; we don’t expect one agent to recognize, validate and exploit a bug in a single pass,” he said. Disagreement between underlying models itself can act as a signal, he wrote. “When an auditor flags something as suspect and the debater can’t refute it, that finding’s posterior credibility goes up,” he said.

MDASH is only being utilized internally by Microsoft engineers and tested by a “small set of customers as part of a limited private preview.”

Microsoft mentioned no plans of an upcoming public release, positioning MDASH as a research and “production-grade defense at enterprise scale.”

Microsoft Debuts Bug Hunting 100-Agent AI System

Tags: