AI and the Offense / Defence Balance in Software Security

Rolling dossier on how frontier AI models reshape the offence-defence balance in software security — both as a sharper attacker tool and as a force multiplier for defenders doing vulnerability discovery, code review, and assurance. Every claim cites a synthesis. Append new sources at the bottom of each section; refactor periodically.

Current state (as of 2026-05-12)

Three converging signals are now visible in the KB:

  • Defender-side capability has just stepped up. Mozilla reports that an early evaluation of Claude Mythos Preview found 271 vulnerabilities in Firefox, shipping as fixes in Firefox 150. An earlier scan with Anthropic Opus 4.6 had already produced 22 security-sensitive fixes in Firefox 148. Mozilla’s read: frontier AI now matches elite human security researchers on real, hardened C++ codebases, closing the historic gap between machine-discoverable and human-discoverable bugs. [[2026-04-21-firefox-mythos-zero-days]]
  • Regulators see the attacker side accelerating too. APRA’s 30 April 2026 letter identifies AI as increasing both volume and sophistication of attacks (prompt injection, data leakage, insecure integrations, autonomous-agent misuse) while defensive practices lag. ASIC’s 8 May 2026 letter names Anthropic’s Mythos explicitly as the kind of frontier capability that will “test existing controls more often and under greater pressure.” [[2026-05-08-apra-ai-governance]]
  • First operational observation of offensive GenAI in the wild. CyberCX’s 2026 Threat Report records 2025 as the first year its DFIR practice observed “threat actors using generative AI to create custom, bespoke scripts and payloads” — quality “dubious”, one observed operator failed to decrypt a password-manager DB, but the barrier-to-entry signal is the point. CyberCX is silent on defensive frontier-AI scanning as a control. [[2026-05-12-cybercx-2026-threat-report]]
  • Horizon risk on the cryptographic substrate, separately from source-level AI offence. AFR’s 11 May 2026 framing pairs quantum cryptography and frontier AI as a single bank cyber-threat narrative, with industry preparing PQC migration “by 2030” for the card payments system. The two threats are technically distinct — Mythos-class source-level reasoning does not threaten classical cryptography; CRQC threatens it — but they are converging in board-level coverage. [[2026-05-11-afr-quantum-banks]]
  • Controls-side picture from CyberCX’s offensive practice. The CyberCX 2026 Hack Report (companion to the DFIR Threat Report) adds three-year empirical depth: 50% of AI penetration tests find a severe vulnerability, vs 26% for web-application pen-tests; severe-finding rates across all engagements improving ~2.25%/yr (33.5% → 29.0% from 2023–2025) — Edelstein characterises this as defender improvement losing the race. CyberCX also names MCP authentication as a rising attack-surface and AI-supported AppSec (threat modelling, exploit PoC generation, test-case generation) as operational but not yet keystone — the defender adoption signal Mozilla’s Firefox 150 result was looking for, at smaller scale. [[2026-05-12-cybercx-2026-hack-report]]

The synthesised picture: frontier AI shifts the balance in both directions. The net effect for any given defender depends on whether they adopt AI-driven defensive analysis with comparable urgency to the attacker side — and right now, the AU/NZ incident-response practice that has seen the offensive side is not yet recommending the defensive analogue as keystone.

The historical offence-defence asymmetry

  • Security has been offensively-dominant because the attack surface is large enough to be difficult to defend comprehensively, while attackers only need one bug. The defender’s traditional best case has been to make exploits expensive enough that only well-funded actors can afford them, not to drive exploits to zero. [[2026-04-21-firefox-mythos-zero-days]]
  • The defender toolkit (defence-in-depth, process sandboxing, memory-safe languages like Rust, fuzzing, elite human researchers) is incomplete: no single layer is bulletproof, Rust only mitigates certain vulnerability classes, fuzzing has uneven coverage, and source-level reasoning is bottlenecked on scarce human expertise. [[2026-04-21-firefox-mythos-zero-days]]

How frontier AI rebalances the offence-defence equation

  • Source-level reasoning is no longer human-bottlenecked. Mythos Preview, per Mozilla, can find any category or complexity of vulnerability that humans can find — closing the machine-vs-human discovery gap that previously favoured attackers concentrating costly human effort on single bugs. [[2026-04-21-firefox-mythos-zero-days]]
  • No alien bug classes seen yet. All findings in the Firefox eval were ones an elite human researcher could in principle have found. Mozilla pushes back against speculation that future AI will surface vulnerabilities defying human comprehension, on the basis that modular human-comprehensible software remains finite-in-defects. [[2026-04-21-firefox-mythos-zero-days]]
  • AI is also sharpening the attacker side. APRA flags AI-accelerated attack volume/sophistication (prompt injection, data leakage, insecure integrations, agent misuse, faster coordinated attacks). [[2026-05-08-apra-ai-governance]]
  • First incident-response evidence of offensive GenAI. CyberCX’s 2025 casebook recorded a first-of-its-kind case: a threat actor using GenAI to generate bespoke scripts and payloads, identifiable from “use of emojis in the code, and tutorial-style descriptions in code comments.” Output quality was poor and the objective (decrypting an internal password manager) failed, but the trajectory matters more than the artefact. [[2026-05-12-cybercx-2026-threat-report]]
  • AI data spills are a new defender liability class. Staff pasting sensitive corporate data into public AI portals from corporate endpoints, often unquantifiable because no DLP, no enterprise licensing, no network logging. CyberCX engaged on these for the first time in 2025. [[2026-05-12-cybercx-2026-threat-report]]
  • AI systems themselves are a new defender liability class. CyberCX’s controls-side data: 50% AI pen-test severe-finding rate vs 26% for WAPT. Vulnerability classes named: in-model IAM / excessive agency, weak or missing guardrails, prompt injection, lack of content filtering, system-prompt exposure, implicit bias, and insecure adoption of MCP. The defender surface grows even as defender tooling improves; the net sign of the rebalance depends on relative adoption rates. [[2026-05-12-cybercx-2026-hack-report]]
  • MCP is now named as an attack-surface, not just a capability surface. “New standards like Model Context Protocol (MCP) are being adopted, but are not yet secure, enterprise-ready implementations. … data can flow bi-directionally between servers and clients, meaning that traditional security controls implemented on the server side of an application must now be implemented on the client side too. This is creating a rise in authentication-related issues with MCP implementations.” This is the first treatment of MCP as defensive liability in this KB. [[2026-05-12-cybercx-2026-hack-report]]
  • AI-assisted AppSec is operational but not yet positioned as keystone. CyberCX’s 2025 practice cites AI-supported threat modelling mapped to internal control frameworks, AI-generated PoC exploits, and AI-driven test-case generation as emerging defender uses. Notably, neither Anthropic Mythos nor the Mozilla / Firefox 150 result is named as a recommended control in either CyberCX 2026 report — defender adoption of frontier source-level reasoning has not yet hit the consultant-recommendation tier in AU/NZ. [[2026-05-12-cybercx-2026-hack-report]] [[2026-04-21-firefox-mythos-zero-days]]

Empirical data points

  • Firefox 148 — 22 security-sensitive bugs fixed from Anthropic Opus 4.6 scan. [[2026-04-21-firefox-mythos-zero-days]]
  • Firefox 150 — 271 vulnerabilities fixed from Claude Mythos Preview initial evaluation. [[2026-04-21-firefox-mythos-zero-days]]
  • MFA bypass is now universal in CyberCX’s BEC casebook: “Every BEC incident CyberCX responded to where traditional MFA was enforced, such as time-based one-time passwords (TOTP) or push notifications, involved session hijacking.” PhaaS kits Tycoon and Sneaky 2FA productise the AITM proxy step. [[2026-05-12-cybercx-2026-threat-report]]
  • Espionage time-to-detect fell from 404 to 148 days in CyberCX’s 2025 data — partly defender improvement, partly attackers caring less about detection. Financial-motivation TTD almost tripled (24 → 68 days). [[2026-05-12-cybercx-2026-threat-report]]
  • First successful self-propagating worm in the npm ecosystem (Shai-Hulud, v1 → v3 in 2025) — the package-manager supply chain is now wormable, complementing the broader “ClickFix / DLL sideloading / RMM living-off-trusted-tools” attacker professionalisation pattern. [[2026-05-12-cybercx-2026-threat-report]]
  • Service-level severe-finding baselines (CyberCX 2025): Active Directory Assessment 78%; Social Engineering 77%; DDoS 75%; Internal Network Pen-Test 71%; AI Penetration Test 50%; External Network Pen-Test 26%; Web Application Pen-Test 26%; Mobile App Pen-Test 16%; Secure Config – Azure 14%. The bar against which any defender claim of “we test our AI systems” should be calibrated. [[2026-05-12-cybercx-2026-hack-report]]

The load-bearing caveat: human-comprehensibility

  • The optimistic defender thesis depends on codebases remaining human-comprehensible. If AI in the development process produces code humans can’t reason about, bug complexity may scale with (or faster than) discovery capability — potentially flipping the balance back to offense or worse. Holley calls human-comprehensibility “an essential property to maintain, especially in critical software like browsers and operating systems.” [[2026-04-21-firefox-mythos-zero-days]]

Implications for defender practice

  • Backlog shock is the operational reality. First scans of hardened, long-lived codebases plausibly return multi-hundreds of valid findings; engineering capacity must be reserved in advance. Mozilla “reprioritized everything else.” [[2026-04-21-firefox-mythos-zero-days]]
  • AI in fuzzing pipelines is insufficient. The source-reasoning capability is what newly closes the gap; defenders who only deploy AI in fuzzing under-capture the value. [[2026-04-21-firefox-mythos-zero-days]]
  • Regulator expectations are starting to converge with this picture. APRA flags insufficient testing of AI systems and AI-generated code as a defensive gap; the inverse expectation — that mature defenders adopt AI-driven discovery as a baseline control — is implicit but not yet written. [[2026-05-08-apra-ai-governance]]

Implications for regulators and boards

  • Cyber-resilience first principles are being called back into focus precisely because the threat landscape has stepped up. ASIC’s 8 May 2026 letter is explicit on this point. [[2026-05-08-apra-ai-governance]]
  • Open governance question: when does failing to use available AI-driven defensive analysis become itself a governance / control gap? Compare APRA’s stance that low board AI literacy is itself a governance risk. [[2026-05-08-apra-ai-governance]] [[2026-04-21-firefox-mythos-zero-days]]

Open threads to watch

  • Economics: what does frontier-AI source review cost per kLoC vs. elite human researcher equivalent? (Mozilla post is silent.)
  • False-positive rates on the 271 Mythos Preview findings against Firefox, and triage methodology.
  • Adversary parity: are independent attackers producing similar volumes of zero-days against Firefox or peers using the same class of model? CyberCX has now seen the first crude offensive GenAI case — the next CyberCX report will be the empirical bellwether for quality trajectory. [[2026-05-12-cybercx-2026-threat-report]]
  • Transferability of the Firefox result to OS, infrastructure, embedded, and less-modular codebases.
  • Cadence and contract terms of recurring frontier-model security evaluation arrangements (Mozilla–Anthropic and analogues).
  • First regulatory enforcement signal that names use-of-AI-for-defence as part of the expected control set, not just a recommendation.
  • Why a major AU/NZ DFIR practice is not positioning frontier-AI defensive analysis as keystone despite having seen the offensive side. Whether CyberCX’s next report names Mythos or an equivalent as a recommended defender control is a tell on industry consensus. [[2026-05-12-cybercx-2026-threat-report]]
  • PQC migration progress in AU banking: whether the AFR 2030 framing translates into APRA / RBA-tracked milestones; cryptographic agility as a control class the ai-security-defense dossier may need to absorb or hand off to a dedicated dossier [[2026-05-11-afr-quantum-banks]]
  • When will a major AU/NZ consultancy explicitly recommend frontier-model source-level review as a baseline control? CyberCX’s 2026 Hack Report cites AI-supported AppSec but stops short of naming Mythos / Mozilla-style scanning as keystone. The first such recommendation in a CyberCX, KPMG, EY, PwC or Deloitte AU/NZ artefact will be a meaningful market-consensus signal. [[2026-05-12-cybercx-2026-hack-report]] [[2026-04-21-firefox-mythos-zero-days]]

Sources

  • [[2026-04-21-firefox-mythos-zero-days]] — Mozilla Blog, Bobby Holley on Firefox 150 + Claude Mythos Preview (offense/defence balance argument).
  • [[2026-05-08-apra-ai-governance]] — MinterEllison synthesis of APRA’s 2026-04-30 letter and ASIC’s 2026-05-08 letter (attacker-side AI risk from the regulator perspective; explicit Mythos reference).
  • [[2026-05-12-cybercx-2026-threat-report]] — CyberCX 2026 annual threat report (first incident-response evidence of offensive GenAI; AI data spills as a new engagement category; MFA-bypass empirics).
  • [[2026-05-11-afr-quantum-banks]] — AFR on AU banks’ paired quantum + AI cyber-threat exposure (partial-content capture).
  • [[2026-05-12-cybercx-2026-hack-report]] — CyberCX STA 2026 Hack Report; controls-side empirical baseline, AI pen-test severe-finding rate (50%), MCP-as-attack-surface, AI-supported AppSec emerging but not keystone.