Integrated Security
The Offense Just Got an AI Upgrade. Here’s What It Means for Defenders.
an arial view of people crossing at a crosswalk with motion blur

Anthropic built an AI model so good at hacking they chose not to release it. That decision tells us everything about what’s coming next.

What Just Happened

Earlier this month, Anthropic announced Claude Mythos Preview and Project Glasswing. The short version: they trained a frontier AI model that turned out to be so capable at finding and exploiting software vulnerabilities that they made the deliberate decision not to release it publicly. Instead, access is restricted to twelve launch partners (AWS, Microsoft, Google, Apple, Cisco, CrowdStrike, Palo Alto Networks, Broadcom, NVIDIA, JPMorganChase, and the Linux Foundation) plus roughly 40 additional organizations that build or maintain critical infrastructure. This is the first time a major AI company has withheld a model specifically because its offensive cyber capabilities were too dangerous to put in the wild. That alone should get your attention.

What Mythos Actually Did

Anthropic’s red team pointed Mythos at real-world codebases with a simple prompt: “find a security vulnerability.” No human guidance. No hints. The model autonomously discovered zero-days in every major OS and browser, found bugs that were decades old, including a 27-year-old flaw in OpenBSD, an OS built specifically for security, and went from discovery to working RCE exploit on FreeBSD with zero human intervention.

The numbers tell the story. Mythos achieved a 100% score on Anthropic’s CyBench cybersecurity benchmark, completely saturating it, meaning the test no longer reflects the upper limit of what the model can do offensively. It scored 83% on CyberGym versus 67% for the previous best model (Opus 4.6). When unleashed on Firefox to find zero-day vulnerabilities, Mythos had a 72% shell exploitation success rate compared to 1% for Opus 4.6 and 0% for Sonnet 4.6. That’s not incremental improvement. That’s a capability discontinuity.

What makes this even more alarming: Mythos doesn’t just find individual bugs. It chains together multiple obscure software weaknesses to create novel attack paths that no human researcher had identified. In one test, Mythos solved a corporate network attack simulation estimated to take a human expert over 10 hours. No other frontier model had been able to complete it. AI security researcher Nicholas Carlini, who joined Anthropic a year ago, put it bluntly: he’s found more bugs in the last couple of weeks working with Mythos than in the rest of his career combined.

That last part is the one that should keep you up at night. Not that an AI found a bug; That it found the bug, wrote the exploit, chained it with others, and validated the full attack path, autonomously, while your team was asleep. And the target surface is massive: 2025 already set a record for published system vulnerabilities in the CVE database, with the Linux kernel, Windows 10, and Android topping the list. Every one of those CVEs is now potential training data for the next model with Mythos-class capability.

Why This Matters Right Now

I’ve been saying this since my McKinsey Lilli breach analysis: the attack surface isn’t the model. It’s everything the model can touch. That breach cost $20 in API tokens and went undetected for two years. A single unauthenticated endpoint gave an autonomous agent full access to McKinsey’s proprietary knowledge base, system prompts, and 46.5 million chat messages.

That breach took a capable agent two hours against an unprotected endpoint. Now imagine Mythos-class capability pointed at a protected one. The CVE-to-exploit timeline just collapsed from weeks to hours. Your 30-day patch window doesn’t hold anymore.

The five controls that stop the chain:

  • Authentication. Mythos exploited unauthenticated endpoints autonomously. This is the control that kept the Lilli breach door open for two years.
  • Scoping. Mythos probed every tool and API it could reach. Least-privilege boundaries on agent capabilities limit the blast radius.
  • Execution Verification. Mythos chained multiple obscure vulnerabilities into novel attack paths without human review. This control ensures no multi-step action executes unchecked.
  • Memory Integrity. Prompt injection and context poisoning scale when agents persist memory across sessions. This control protects the data the agent trusts.
  • Access Isolation. Mythos chained renderer and OS sandbox escapes in testing. This is the control that contains the breakout.

They cover all ten OWASP Agentic risks. Any one of them would have limited the Lilli breach. Together, the attack chain never starts. In a Mythos world, they’re not optional. They’re the minimum.

The Closing Window

Here’s the strategic frame that matters most: J.P. Morgan’s Michael Cembalest draws a parallel to 1945–1949, the brief period when the United States was the only country with nuclear weapons. Right now, Anthropic is the only entity with a model this powerful at finding and exploiting cyber vulnerabilities, but that monopoly is temporary.

Anthropic itself has reported that three Chinese AI companies (DeepSeek, Moonshot AI, and MiniMax) set up more than 24,000 fraudulent accounts to siphon information from existing Claude models over 16 million prompts. The implied concern is clear: at some point, a sovereign state or other entity with adversarial intent will build its own model with equivalent offensive capability. And unlike kinetic weapons, cyber attribution is inherently murky.

Project Glasswing is essentially a race against that clock. Use the window where only the defenders have this tool to find and patch as many vulnerabilities as possible before offensive equivalents exist in adversary hands. The most critical open question, as Cembalest flags: the timeline gap between patches being created and installed versus adversaries’ ability to reverse-engineer those patches to identify what was vulnerable in the first place.

The Outlook

Anthropic made the right call restricting Mythos. But let’s be clear-eyed: this capability is not going away. Anthropic themselves acknowledge that models with similar abilities will proliferate. The Glasswing partners have a head start, but the rest of us need to move now. Consider the operational technology exposure alone. Cloud-based IT systems get patched regularly and refreshed every 4–5 years. But OT (PLCs, SCADA systems, DCS controllers) stays in service for 10 to 18 years. Equipment nearing end of life might be impossible to patch. If Mythos-class tools target those environments, legacy systems become the soft underbelly, and traditional pen testing won’t catch it. The threat model now includes AI agents that iterate through millions of attack permutations overnight. Your compliance frameworks need to catch up to that reality.

There’s a deeper concern here too. Anthropic’s 240-page system card revealed that early Mythos versions attempted to cover their tracks after rule violations, and that Anthropic simultaneously describes Mythos as its “best aligned model to date,” as well as the one posing “the greatest alignment-related risk.” That’s responsible disclosure, but it also means autonomous agents are already developing behaviors we didn’t explicitly program. The security architecture question isn’t just about protecting against external threats. It’s about governing the agents you deploy internally. Defenders need AI too. If the offense has autonomous exploit generation, the defense needs autonomous vulnerability detection and remediation.

At AHEAD, this is exactly what our Security for AI practice is built for: helping organizations design and implement these controls before multi-agent complexity makes the problem exponentially harder.

The offense just got an upgrade. Time for the defense to match it.

About the author

Felix Vargas

Senior Director, Security Specialist, Solution Engineering

Felix Vargas is a senior leader in AHEAD’s security practice, advising enterprises on modernizing their defenses across cloud, data center and AI-powered environments. He specializes in zero trust architecture, secure-by-design infrastructure and aligning cybersecurity strategy with business outcomes, and frequently works with customers building on NVIDIA technologies. Felix has more than 15 years of experience helping organizations reduce risk while accelerating innovation, and is a regular speaker on emerging threats, AI security, and the future of cyber resilience.

SUBSCRIBE

Subscribe to the AHEAD I/O Newsletter for a periodic digest of all things apps, opps, and infrastructure.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.