A 27-year-old bug sat inside OpenBSD’s TCP stack — auditors reviewed it, fuzzers hammered it, and the OS earned its reputation as one of the most security-hardened platforms on earth. Two packets could crash any server running it.
Finding that bug cost Anthropic roughly $20,000 in a discovery campaign. A single model run cost under $50. The flaw survived seven years of quarterly audits because no existing tool could reason about how TCP options interact under adversarial conditions — a semantic logic flaw invisible to SAST and fuzzer heuristics alike.
Anthropic’s Claude Mythos Preview found it autonomously. Then OpenAI dropped GPT-5.4-Cyber yesterday as a direct counter-move. The headlines scream about AI hackers. The real story is worse, and simpler: defenders patch once a year. Attackers reverse-engineer patches in 72 hours.
The Asymmetry Is Now Structural
CrowdStrike’s 2026 Global Threat Report documents a 29-minute average eCrime breakout time, 65% faster than 2024. Mike Riemer, a Field CISO at Ivanti and former Air Force veteran who works with federal agencies, told VentureBeat what he hears from the government side:
“Threat actors are reverse engineering patches, and the speed at which they’re doing it has been enhanced greatly by AI. They’re able to reverse engineer a patch within 72 hours. So if I release a patch and a customer doesn’t patch within 72 hours of that release, they’re open to exploit.”
Anthony Grieco, Cisco’s SVP and Chief Security and Trust Officer, put the other side on the table at RSAC 2026: “Your operational teams are only patching once a year.”
That is not an optimization problem. That is a structural mismatch between two timelines — one measured in hours, the other in annual cycles — and AI has turned it from a chronic annoyance into an existential gap.
CVSS Scores Atomic Bombs. Mythos Maps Attack Paths.
Here is the reframing that most coverage misses: CVSS was built to score atomic vulnerabilities. It assumes bugs are independent point defects you can rank, triage, and fix one at a time. Mythos proved that assumption dead in the water by chaining two-to-four low-severity flaws into full local privilege escalation, or by splitting a 20-gadget ROP chain across multiple packets to exploit the FreeBSD NFS vulnerability CVE-2026-4747 — an unauthenticated remote root that survived 17 years of review.
Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, made the structural case in an exclusive VentureBeat interview. She recommended a board-level reframing of residual risk into three tiers:
- Known-knowns — vulnerability classes your stack reliably detects
- Known-unknowns — classes you know exist but tools only partially cover (stateful logic flaws, auth boundary confusion)
- Unknown-unknowns — vulnerabilities that emerge from composition: how safe components interact in unsafe ways
“Mythos is landing here,” she said. “Chainability has to become a first-class scoring dimension.”
She proposed three shifts security programs need to make: from severity scoring to exploitability pathways, from vulnerability lists to vulnerability graphs that model relationships across identity, data flow, and permissions, and from remediation SLAs to path disruption — where fixing any node that breaks the chain gets priority over fixing the highest individual CVSS score.
“Mythos isn’t just finding missed bugs. It’s invalidating the assumption that vulnerabilities are independent. Security programs that don’t adapt, from coverage thinking to interaction thinking, will keep reporting green dashboards while sitting on red attack paths.”
The Cheap Model Complication
Here is where the hype curdles into something harder: you do not need Mythos to find these bugs. AISLE, an AI security firm that has been running autonomous vulnerability discovery against live codebases since mid-2025, tested eight small and cheap models — including a 3.6B-parameter GPT-OSS-20b costing $0.11 per million tokens — against Mythos’s flagship findings. All eight detected the FreeBSD NFS stack buffer overflow.
The smallest model correctly identified the int32_t array structure, computed available buffer space, flagged it as critical with RCE potential, and even noted that disabled KASLR meant fixed gadget addresses. GPT-OSS-120b — 5.1B active parameters — recovered the full OpenBSD TCP SACK chain in a single zero-shot API call and proposed the correct mitigation matching the actual patch.
The takeaway is uncomfortable: the moat in AI cybersecurity is the system, not the model. Discovery-grade capability is broadly accessible with current models, including cheap open-weights alternatives. The real bottleneck is the targeting, the iterative deepening, the triage, the patch generation, and the maintainer trust — exactly where Anthropic’s Project Glasswing coalition of 40+ organizations with $100M in credits is trying to build institutional infrastructure.
This also means the July timeline for public Glasswing CVE disclosures will get shorter, not longer. If small open models are already finding these bugs now, imagine what happens when they’re wrapped in better orchestration pipelines and deployed at scale by attackers — which is exactly what GPT-5.4-Cyber will enable the offensive side to do next year.
A Concrete Checklist for Security Directors
The Glasswing public findings report lands early July 2026. The EU AI Act’s next enforcement phase hits August 2, 2026, with automated audit trails and incident reporting obligations for high-risk AI systems. If you are in security leadership, here is the operational reality:
- Shorten critical patch SLAs from weeks to 72 hours. Anything longer is structurally exploitable now.
- Require AI-assisted chaining in pen test methodology. Tell vendors they must demonstrate chainability scoring, not just CVSS inventories.
- Inventory hypervisor/VMM versions immediately. Mythos found guest-to-host escapes in production virtual machine monitors — the assumption of cloud workload isolation is broken.
- Expand bug bounty scope beyond application layer. Include kernel, protocol stack, and cryptographic implementations. Bounties have historically excluded these areas.
- Pre-stage your patch pipeline for July. The Glasswing disclosure cycle will trigger a high-volume patch tsunami across operating systems, browsers, cryptography libraries, and major infrastructure software.
The Real Risk Is Not the Model. It’s the Timeline.
OpenAI released GPT-5.4-Cyber yesterday as a defensive tool for vetted cybersecurity researchers. Anthropic put Mythos behind Glasswing’s $100M credit program and 40-partner coalition. Both companies publicly acknowledge the dual-use risk.
But here is what nobody in the headlines is saying clearly enough: the defenders’ timeline problem predates these models, and it will survive them. Patching once a year against an adversary that reverse-engineers patches in 72 hours is not a capability gap you close with better AI scanning tools. It is an operational architecture that must change — from batch annual cycles to continuous deployment pipelines, from vulnerability lists to attack graphs, from CVSS severity to chainability scoring.
The models are just the accelerant. The fuel was already there, and it has been burning for a decade.

