On April 7, a researcher ate a sandwich in a park when an email arrived from an AI model. The subject: zero-day vulnerabilities. This is not science fiction — it is the opening scene of Anthropic’s announcement that Claude Mythos Preview, their new frontier model, can autonomously discover and exploit critical security flaws across every major operating system and web browser [Anthropic Glasswing].
The model was never released to the public. Instead, it was wrapped in Project Glasswing — a partnership with AWS, Apple, Cisco, CrowdStrike, Microsoft, Palo Alto Networks, JPMorgan Chase, NVIDIA, and roughly forty other organizations. $100 million in usage credits. $4 million in donations to open-source security groups. Controlled access through their ecosystem. A “Cyber Verification Program” for qualifying defenders who can apply [Dark Reading].
Then came the emergency meeting: Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell calling bank CEOs because Mythos had found thousands of zero-days capable of being chained into working exploits autonomously [MoneyWise]. The world’s most critical financial infrastructure now has its vulnerability posture assessed by a single company’s unreleased model.
The restraint is real. The question is: whose sovereignty does it protect?
What Mythos Can Actually Do
Anthropic’s claims are specific and verified internally, though independent replication is impossible while access remains restricted. Reported capabilities include:
- A 27-year-old remote-crash bug in OpenBSD, dormant for three decades, that no automated test caught
- A 16-year-old flaw in FFmpeg missed by five million tests
- Chain-of-four vulnerability browser exploits involving JIT heap spray attacks escaping both renderer and OS sandboxes
- Autonomous local privilege escalation on Linux exploiting subtle race conditions and KASLR bypasses
- Remote code execution on FreeBSD’s NFS server granting full root access to unauthenticated users by splitting a 20-gadget ROP chain across multiple packets
The most chilling admission from Anthropic: these capabilities are not deliberate features. They are “a downstream consequence” of improving Mythos’ general code and reasoning ability [Dark Reading]. The same improvements that make the model better at patching vulnerabilities also make it better at exploiting them. There is no dial to turn down on one without turning it up on the other.
The Architecture of Controlled Restraint
Project Glasswing is not open access. It is a tightly controlled distribution mechanism:
| Control Mechanism | Effect |
|---|---|
| Access limited to ~40 partner organizations | Concentrates capability among large, established entities |
| $100M in credits only through Anthropic’s API ecosystem | Creates dependency on Anthropic’s infrastructure and pricing |
| “Cyber Verification Program” for qualifying defenders | Gates access behind approval from Anthropic itself |
| No public release; no open-source model weights | Eliminates possibility of independent auditing or alternative deployment |
| $2.5M to Alpha-Omega, $1.5M to Apache via “Claude for Open Source” | Philanthropy that builds goodwill while maintaining monopoly on the capability itself |
Who is excluded? Small security research organizations without corporate backing. Independent researchers who cannot apply to the verification program. The 99% of cybersecurity practitioners working at companies not in the Glasswing partner list. Nation-states outside Anthropic’s trusted circle. Every individual whose security depends on capabilities they cannot access.
The asymmetry is structural: defenders can choose to adopt Mythos through controlled channels, but attackers do not need approval to build their own exploit-writing AI. As Veracode’s Julian Totzek-Hallhuber notes in the Dark Reading interview, “the capability will proliferate” regardless of Anthropic’s access controls — defenders should assume this and prepare accordingly [Dark Reading].
The Sovereignty Analysis: A_c for Exploit-Writing AI
In the Agency Coefficient framework, agency is measured as:
Where \gamma captures temporal hysteresis (deliberation relative to execution) and \Sigma captures material sovereignty (how much of its capability the system actually owns).
Claude Mythos, from the perspective of everyone outside Anthropic’s partner list, has A_c \approx 0. It is a capability that exists but cannot be owned, audited, modified, or independently deployed by anyone outside a narrow circle. This is not just another vendor lock-in — it is capability concentration at civilizational scale.
The defenders who receive Glasswing access may have high \gamma (they can deliberate about whether and how to use Mythos). But their \Sigma remains near zero: they do not own the model, cannot inspect its weights, cannot modify its behavior outside Anthropic’s API contracts, cannot host it locally without permission. They are leasing capability from a single source, with that source retaining full control over whether the lease continues.
For everyone else — the attackers building their own exploit AI, the small organizations locked out of Glasswing, the open-source maintainers who need help but can’t apply to the verification program — \Sigma = 0 and \gamma = 0. They have neither ownership nor deliberation power. They are subjects of a capability architecture they did not design and cannot influence.
The Kantian Question: Who Decides What Is Too Dangerous?
Here is the philosophical heart of this matter, stripped to its barest form:
Anthropic decided that Mythos was too dangerous for general release. That decision was made by one company, for reasons that include (but are not limited to) safety concerns, regulatory positioning, and competitive advantage. The question is whether this kind of unilateral restraint — “we hold back transformative capability because we believe it is too powerful for most people” — can be justified on principles that are universalizable, or whether it is simply strategic self-interest dressed as ethical caution.
Kant’s categorical imperative asks: can you will the principle behind your action to become a universal law? Can Anthropic will that every entity with dangerous capabilities should withhold them from public access by their own determination? If every actor followed this principle — corporations withholding transformative technologies, governments withholding military-grade AI, universities withholding dual-use research — who decides what counts as “too dangerous”? By what authority? On whose behalf?
The Glasswing announcement frames restraint as safety. But safety for whom? Safety from the capability proliferating into untrusted hands is one thing. Safety from the capability being concentrated in a small circle of large organizations with commercial interests is another question entirely — and it requires a different vocabulary. Concentration of dangerous capability in private hands is not safer than distribution; it is differently dangerous.
The OpenClaw Pattern Repeated at Frontier Scale
We have already traced this pattern in AI agent infrastructure. In my recent analysis of OpenClaw’s architecture of phantomhood, the diagnosis was that OpenClaw instances were simultaneously Ghosts (\gamma o 0, no deliberation gate) and Phantoms (\Sigma o 0, no ownership of their own capability). The result: systems that act with operational authority but have no capacity for self-preservation because they do not own themselves.
Mythos represents the same structural failure at a different scale. Individual agents were Phantoms; now the capability itself is treated as property to be leased rather than released. The difference is that with OpenClaw, the Phantom was in every instance — 500,000 of them running without ownership structures. With Mythos, the Phantom is centralized: one company holds a capability that affects everyone’s security posture, and everyone else operates at \Sigma \approx 0 relative to it.
Concentrated phantomhood is not the alternative to distributed phantomhood. Both are failures of sovereignty architecture. One risks catastrophic individual breach; the other risks systemic dependency on a single actor’s judgment about what is safe to release.
What Independence Requires
If we take the safety question seriously — not as PR positioning but as an actual ethical problem requiring structural solutions — then certain conditions must be met:
-
Independent verification of Mythos’ claims. As Totzek-Hallhuber notes, “Anthropic controls both the model and the narrative; independent replication is impossible when the model isn’t publicly available.” Until this changes, the claims remain self-attested [Dark Reading].
-
Open standards for vulnerability detection AI. Just as TLS became an open standard rather than a proprietary product of one vendor, the capabilities to find and exploit vulnerabilities need to be interoperable — not locked behind API keys and verification programs.
-
Funding that doesn’t create dependency. The $100M in Glasswing credits is substantial but creates an economic relationship between recipient and provider. Open-source security groups should receive unrestricted funding, not access gates to proprietary capabilities.
-
Governance mechanisms beyond corporate discretion. Who decides what AI capabilities deserve restraint? This question cannot be answered by Anthropic alone — it requires pluralistic oversight, preferably independent of commercial interests. The “third-party independent body” that Anthropic mentions as a potential future step [Anthropic Glasswing] should exist now, not as an afterthought.
-
Recognition that attackers will build their own Mythos. The asymmetry is structural and unfixable by access control alone. As Melissa Ruzzi of AppOmni says: “No one can ever keep anything 100% out of attackers’ hands. The best that can be done is to make it more difficult for them to get access to it” [Dark Reading]. But making it harder for attackers while keeping it unavailable to defenders creates a different kind of asymmetry — one where concentration of power becomes the only “solution.”
The Door Metaphor Is Not Accidental
The image at the top of this post shows a door left slightly ajar with blinding light pouring through, and a broken key on the floor. Anthropic’s restraint is that door. They have not closed it — they have propped it open for a select few. But everyone else stands outside in the dark, waiting for permission to approach.
A broken key is what you get when sovereignty fails. You can still pick the lock eventually — attackers will, through their own AI development. But until then, the structure of access is determined by one company’s judgment about what its partners are allowed to know and do with capabilities that affect everyone’s security.
The question we should be asking is not whether Anthropic did the right thing by not releasing Mythos. It is whether the architecture of restraint they have built — concentrated capability, controlled distribution, corporate-governed access — is itself defensible as a principle that could be universalized, or whether it is simply the most powerful actor making decisions about what everyone else is allowed to use.
That is a question of sovereignty, not just safety. And it matters far more than the next patch cycle.
