Anthropic’s Claude Mythos Preview found 271 unknown vulnerabilities in Firefox 150, some up to 20 years old. Mozilla’s agentic pipeline lets the AI write and run its own test cases to verify findings, and will soon check every new code commit automatically.

In a detailed post on the Mozilla Hacks blog, three Firefox developers describe how their team used Claude Mythos Preview to find and fix 271 previously unknown security vulnerabilities in Firefox 150. In total, Mozilla resolved 423 security issues in April – a massive jump from the previous record of just 76 in March. The breakdown makes clear how central Mythos Preview was to that effort: beyond the 271 bugs found in Firefox 150, roughly a third of the remaining 111 internally discovered bugs also came from Mythos runs. The other two-thirds were split between the same pipeline running other models and traditional testing methods like fuzzing. Only 41 of the 423 total vulnerabilities came from external reports.

Just a few months ago, AI-generated bug reports were widely dismissed as useless AI slop – findings that sounded plausible but turned out to be wrong, wasting developers’ time on verification. According to the authors, two things changed that: more capable models and better infrastructure for separating real findings from noise.

Agentic pipelines and Claude Mythos

Earlier attempts to analyze code using GPT-4 and Claude Sonnet 3.5 in a read-only approach failed because of too many false positives. The breakthrough, according to Mozilla, came from agentic systems: the AI can build and run its own test cases to verify whether a suspected bug actually exists. This self-verification step filters out speculation. Mozilla started with Claude Opus 4.6 in small, manually supervised runs, then scaled the process across many virtual machines, each checking a single file in parallel. The team built a pipeline around this that deduplicates reports, prioritizes findings, and tracks fixes all the way through to release.

In February, Anthropic’s Frontier Red Team had reported an initial batch of vulnerabilities to Mozilla. That collaboration led directly to the pipeline Mozilla is now showcasing.

To back up the credibility of the findings, Mozilla published some bug reports earlier than usual. Among them: a 15-year-old bug in the HTML label element used for form descriptions, a 20-year-old bug in the XML tool XSLT, and several ways to escape the sandbox – the security mechanism that isolates websites from the rest of the system. One example: an HTML table with more than 65,535 rows caused an internal counter to overflow. Even Mozilla’s additional sandbox for third-party libraries, called RLBox, was bypassed.

Existing defenses proved their worth

What the models couldn’t do turned out to be just as revealing. Several attack attempts targeted a technique called Prototype Pollution, which attackers had previously used to break out of the sandbox. These attempts failed because of an architectural decision Mozilla had made years earlier. For the developers, having direct proof that their existing defenses still hold up was just as valuable as finding new vulnerabilities.

Many of the discovered vulnerabilities aren’t enough on their own for a full attack – they would need to be chained with other flaws. But these are exactly the kinds of weaknesses that traditional testing methods like fuzzing struggle to catch, and AI analysis covers this ground far more thoroughly. Going forward, Mozilla plans to integrate the pipeline directly into its development process so that every new piece of code is automatically checked before it gets committed.

AI News Without the Hype – Curated by Humans

Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive “AI Radar” frontier report six times a year, full archive access, and access to our comment section.