The zero-days are numbered — Synthesis

Source: Mozilla Blog (Firefox), “The zero-days are numbered” (21 April 2026). Author: Bobby Holley (Firefox). Underlying event: Firefox 150 release ships fixes for 271 vulnerabilities identified during an early evaluation of Claude Mythos Preview against the Firefox codebase. Continues an earlier collaboration with Anthropic that used Opus 4.6 to find 22 security-sensitive bugs fixed in Firefox 148.

Headline message

Frontier AI has, in Mozilla’s experience, just closed the historic gap between machine-discoverable and human-discoverable software vulnerabilities. For a hardened target like Firefox, any one of these bugs would have been “red-alert in 2025”; finding 271 at once is the kind of result that makes a defender wonder if it’s even possible to keep up. Holley’s argument is that the vertigo is temporary, and the underlying shift is decisively pro-defender: closing the human–machine gap erodes the attacker’s long-standing asymmetric advantage. Mozilla is not yet finished, but believes it has “turned the corner” toward a future where defenders finally have a chance to win, decisively.

A material caveat is buried in the footnote: this conclusion depends on software remaining human-comprehensible. If AI-assisted development scales bug complexity faster than AI-assisted analysis scales bug discovery, the dynamic could flip back.

Key takeouts

The numbers. Firefox 148 fixed 22 security-sensitive bugs from an Opus 4.6 scan; Firefox 150 fixes 271 vulnerabilities from an early evaluation of Claude Mythos Preview. Both came out of Mozilla’s collaboration with Anthropic.
Capability claim. “So far we’ve found no category or complexity of vulnerability that humans can find that this model can’t.” Mythos Preview matches elite human security researchers — the bottleneck class of attacker capability — on a real, large, hardened C++ codebase.
No new alien bug classes (yet). All findings could in principle have been found by an elite human researcher. Mozilla pushes back against speculation that future AI models will surface entirely new vulnerability categories that defy human comprehension.
Offense-defence rebalance. Security has historically been offensively-dominant: the attack surface is too large to defend comprehensively, while attackers only need one bug. AI-driven defence converts the discovery problem from scarce-human-time to abundant-machine-time, eroding the attacker’s edge.
Defence-in-depth still matters but is incomplete. Per-site process sandboxes, Rust adoption, and fuzzing each help, but: sandboxes can be escaped by chaining bugs; Rust only mitigates certain (very common) classes; fuzzing has uneven coverage and misses what requires source-level reasoning. Mythos Preview attacks the residual.
A live risk to the thesis (footnote). If codebases begin to surpass human comprehension because of more AI in the development process, bug complexity may scale with — or faster than — discovery capability. Holley calls out human-comprehensibility as an essential property to maintain, “especially in critical software like browsers and operating systems.”

Wider context

This post lands into an active regulatory and policy conversation about frontier AI’s effect on cyber risk. Two adjacent signals from the Australian context already in this KB:

ASIC’s open letter of 8 May 2026 to AFS licensees explicitly names Anthropic’s Mythos as the kind of frontier capability that will “test existing controls more often and under greater pressure”, and calls boards back to first-principles cyber resilience. [[2026-05-08-apra-ai-governance]]
APRA’s 30 April 2026 letter observes that AI is increasing both volume and sophistication of attacks (prompt injection, data leakage, insecure integrations, autonomous-agent misuse) and warns that the defensive side is lagging — including insufficient testing of AI systems and AI-generated code. [[2026-05-08-apra-ai-governance]]

Read together with this Firefox post, the picture is two-sided. Regulators see frontier AI as a sharper sword raised against the defender. Mozilla’s experience suggests the same frontier capability gives the defender, for the first time, a credible chance of finding their own bugs faster than the attacker can. Both can be true at once: AI sharpens offense, but for those who deploy it on their own code defensively, it also disproportionately benefits defence — provided they “reprioritize everything else to bring relentless and single-minded focus to the task.”

The post is also notable as a corporate-vendor case study of an emerging deployment pattern: a frontier-model vendor (Anthropic) partnering with a major open-source defender (Mozilla) to run early-access security evaluation on the defender’s own codebase, with results shipping in user-facing releases. This is a more mature collaboration model than ad-hoc bug bounty work.

Section-by-section breakdown

1. What happened, in numbers

Holley reports two collaborations:

First wave (covered in an earlier post not in this PDF): Anthropic’s Opus 4.6 scanned Firefox, leading to 22 security-sensitive bug fixes in Firefox 148.
Second wave (this post): Mozilla applied an early version of Claude Mythos Preview to Firefox. Firefox 150 ships fixes for 271 vulnerabilities identified during the initial evaluation.

The post frames this not as a one-off but as a recurring capability now within reach for other defender teams: “As these capabilities reach the hands of more defenders, many other teams are now experiencing the same vertigo we did when the findings first came into focus.”

2. Why it feels like vertigo, and why Mozilla thinks it shouldn’t

Holley acknowledges the immediate emotional response: “For a hardened target, just one such bug would have been red-alert in 2025, and so many at once makes you stop to wonder whether it’s even possible to keep up.”

His counter-argument is that the experience, while demanding (“You may need to reprioritize everything else”), is structurally hopeful: “we’ve turned the corner and can glimpse a future much better than just keeping up. Defenders finally have a chance to win, decisively.”

The bridge between vertigo and optimism is the offense-defence balance argument in §3.

3. Why security has been offensively-dominant

The post lays out the long-standing offensive advantage in plain terms:

“Until now, the industry has largely fought security to a draw. Vendors of critical internet-exposed software like Firefox take security extremely seriously and have teams of people who get out of bed every morning thinking about how to keep users safe. Nevertheless, we’ve all long quietly acknowledged that bringing exploits to zero was an unrealistic goal. Instead, we aimed to make them so expensive that only actors with functionally unlimited budgets can afford them, and that the cost of burning such an expensive asset disincentivizes those actors against casual use.”

The asymmetry: “the attack surface isn’t infinite, but it’s large enough to be difficult to defend comprehensively with the tools we’ve had available.” Attackers only need one chink; defenders must cover all of it.

4. The defender’s existing toolkit and its limits

Mozilla’s pre-AI toolkit, as described:

Defence-in-depth (per-website process sandboxing in Firefox) — bulletproof at no single layer; attackers chain renderer bugs with sandbox-escape bugs to escalate privileges.
Memory-safe languages (Rust) — Mozilla has “led the industry in building and adopting Rust” but cannot afford to stop everything and rewrite “decades of C++ code,” and Rust only mitigates certain (very common) vulnerability classes.
Internal red team using dynamic analysis (fuzzing) — “fruitful in practice, but some parts of the code are harder to fuzz than others, leading to uneven coverage.”
Elite human security researchers doing source-level reasoning to find what fuzzers miss — “effective, but time-consuming and bottlenecked on scarce human expertise.”

This last category is exactly the bottleneck that Mythos Preview removes.

5. The capability claim, stated narrowly

The strongest direct claim in the post:

“So far we’ve found no category or complexity of vulnerability that humans can find that this model can’t.”

Note the precise scope. Holley is not claiming Mythos has invented new classes of bugs. He is claiming it matches the upper bound of human capability — and he then argues (in §6) that this is what matters for the offense-defence balance.

6. Why closing the human–machine gap favours defenders

Holley’s argument:

“A gap between machine-discoverable and human-discoverable bugs favors the attacker, who can concentrate many months of costly human effort to find a single bug. Closing this gap erodes the attacker’s long-term advantage by making all discoveries cheap.”

Mozilla also did not see novel bug classes:

“Encouragingly, we also haven’t seen any bugs that couldn’t have been found by an elite human researcher. Some commentators predict that future AI models will unearth entirely new forms of vulnerabilities that defy our current comprehension, but we don’t think so. Software like Firefox is designed in a modular way for humans to be able to reason about its correctness. It is complex, but not arbitrarily complex.”

The pay-off line: “The defects are finite, and we are entering a world where we can finally find them all.”

7. The footnote that matters

The post’s footnote 1 is the load-bearing caveat:

“There’s a risk that codebases begin to surpass human comprehension as a result of more AI in the development process, scaling bug complexity along with (or perhaps faster than) discovery capability. Human-comprehensibility is an essential property to maintain, especially in critical software like browsers and operating systems.”

In other words: the optimistic conclusion holds only while software remains designed for humans to reason about. AI-generated code that no human can audit, paired with AI-driven discovery, could reverse the balance back to offense-dominance — or worse.

Action implications / open questions

For a security-engineering org:

Stand up an AI-assisted source review program, not just AI-augmented fuzzing. Holley’s whole argument is that the source-reasoning capability is what newly closes; if you only deploy AI in fuzzing pipelines you under-capture the value.
Plan for backlog shock. A first scan of any hardened, long-lived codebase plausibly returns multi-hundreds of valid findings. Engineering capacity to triage, fix, and ship must be reserved in advance. Mozilla “reprioritized everything else.”
Treat human-comprehensibility as a security property, not just a code-quality property. Resist patterns where AI-generated code is merged without a human able to reason about its correctness — the footnote is policy-relevant, not philosophical.

For regulators and boards:

This post is concrete evidence that frontier models meaningfully shift cyber risk in both directions. APRA’s and ASIC’s warnings about attack-side acceleration are reinforced; the post also implies a corresponding expectation that mature defenders will adopt AI-assisted vulnerability discovery as a baseline control.
Open question: when does failing to use available AI-driven defensive analysis become itself a governance gap? (Compare APRA’s stance that “gap in board AI capability is itself a governance and control risk.” [[2026-05-08-apra-ai-governance]])

Open questions left by the post:

What does Mythos Preview cost to operate at this scale, and how does that compare with elite human researcher cost? The post is silent on economics.
What was the false-positive rate on the 271 findings, and how was triage done? Implied to be a serious effort but not quantified.
Did adversaries get the same uplift? The post addresses attacker advantage in principle, but not whether independent attackers running similar AI tools have also produced large new banks of zero-days against Firefox or peers.
How transferable is the result to other classes of software (operating systems, infrastructure, embedded, web apps), and to less-modular codebases?
What is the long-term arrangement with Anthropic — recurring evaluation cadence, scope, terms? The post describes “continued collaboration” but does not formalise it.

ai-study

Explorer

2026-04-21-firefox-mythos-zero-days