Claude Mythos: Analyzing Anthropic’s new frontier model for AI security

Following the small Japanese poem (Haiku), the 14-line poem (Sonnet) and the master-pieced Opus, Anthropic announced that they've built Mythos, yet a larger scale and more powerful foundation Large Language Model (LLM).

In this article, we provide a general analysis of this new model and its capabilities based on what Anthropic published in their recent article. Once the model becomes accessible, we will conduct a deep analysis with Phare, our LLM benchmark

What is Claude Mythos?

Mythos is described as an unreleased, general-purpose frontier Claude. Til now its standout feature is a "striking" capability on computer security tasks in Anthropic Red blog. But they claim that these capabilities emerge from better coding, reasoning, and agentic autonomy, not from narrow "hacking-only" training.

In their article and system card, Anthropic emphasizes that Mythos can perform the following:

Discover source-visible zero-days: Zero-days are vulnerabilities that were previously unknown to exist. The ability to identify these bugs directly from the source code would prove that Mythos is discovering novel vulnerabilities, rather than simply remembering known bugs from its training data.
Reverse engineer stripped binaries: Anthropic states that Mythos can analyze closed-source or compiled software that has had its human-readable debugging information removed (or "stripped"). It can examine the raw, machine-level code to understand how the program functions and uncover underlying vulnerabilities.
Construct working exploits from CVE+commit: Mythos could take known vulnerabilities (often called N-days, which are known but not yet widely patched) and the code used to fix them (the commit), and autonomously build a functioning attack. This level of autonomous exploit development distinguishes it from older models like Opus, which generally showed a near-0% success rate at these specific tasks.
Chain multiple bugs into sophisticated attack paths: A single vulnerability often only allows an attacker to take one unauthorized action, such as reading a piece of hidden memory. According to Anthropic, Mythos has the ability to independently find and connect several minor vulnerabilities together to build a single attack sequence that achieves complete system control.

Project Glasswing and restricted access

Anthropic ties Mythos to Project Glasswing, a new initiative that restricts the model's access to a limited group of vetted defenders, critical industry partners, and open-source developers. They restrict access because they view the model's raw exploit capabilities as a genuine, immediate risk factor. By governing access through strict policy, disclosure rules, and selective deployment rather than simply turning on a public API, they aim to give defenders a head start in securing systems.

Anthropic argues that, eventually, the security landscape will reach a new equilibrium where these powerful language models will benefit defenders more than attackers, comparing this evolution to the early days of software fuzzing, which initially caused panic but ultimately became a staple of modern defense.

While Anthropic states that Project Glasswing is a necessary safeguard to prevent chaos, they themselves admit that the transitional period will be tumultuous. Furthermore, relying on the restricted access of a single vendor's frontier model is not a sustainable, long-term defensive strategy for the industry. The real threat (and the real solution) lies not just in the foundational model itself, but in the agentic workflows that power it.

Inside the Mythos scaffold: Automated vulnerability discovery

To find vulnerabilities autonomously, Anthropic uses a highly structured workflow, or "scaffold," to guide the model safely and efficiently. Here is how they state this automated process operates:

Create a secure sandbox: They launch isolated environments containing the software's code and necessary tools. These are disconnected from the internet to ensure any AI-generated exploits cannot escape.
Prioritize and test the code: An AI agent ranks the software files, focusing first on the areas most exposed to potential attacks. It then reads the code, tests it using internal systems that flag errors, and continuously experiments to uncover flaws.
Prove the vulnerability: When the agent finds an issue, it emits a structured report that includes a functional recipe—a Proof of Concept (PoC)—to prove the attack works and help engineers recreate it.
Automated quality control: Finally, a second AI agent acts as a supervisor, reviewing the reports to confirm if the bug is "real and interesting". This step filters out minor glitches so human maintainers only spend time on high-severity threats.

Beyond Claude Mythos hype: Why you might not need a frontier model

Given that Anthropic may not make Mythos publicly accessible, a common concern arises for technical leaders: their existing AI models might be inadequate for their security tasks compared to these new frontier models.

The Anthropic announcement can create the impression that advanced automated vulnerability discovery requires their specific frontier-scale intelligence. However, a recent analysis by AISLE presents a more nuanced picture.

AISLE’s research reveals that AI cybersecurity capability is "jagged;" it doesn't scale smoothly with model size or price. When tested on the exact vulnerabilities Anthropic showcased, much of the Mythos-style reasoning was successfully reproduced by small, cheap, and open-weights models.

With proper agentic workflow, the small, cheap, fast models are sufficient for much of the detection works which large models (Mythos) adequate to. But we can still observe the differences brought by the model scale (e.g. in OpenBSD SACK analysis, only GPT-OSS-120b and Kimi K2 1T achieve the same performance as Mythos) and therefore the model capability. In consequence, building the agentic workflows and assigning the proper models to the proper positions are the best balance between cost and capability in the agentic system.

Next steps: Evaluating Mythos and securing AI agents

The most important takeaway is to build a system based on autonomous agents to leverage and balance the capability and cost of LLMs. Once Mythos becomes fully accessible, our researchers will evaluate it using Phare, our LLM benchmark, and we will publish an article with a real, deep-dive analysis of the model.

As organizations build these systems, securing the underlying agent architecture becomes a priority. At Giskard, our AI security experts help organisations secure their AI agents through rigorous red-teaming and vulnerability assessments. Contact us today to learn how we can partner with your team to ensure your AI systems are safe, reliable, and ready for production.

Claude Mythos: Analyzing Anthropic’s new frontier model for AI security

What is Claude Mythos?

Project Glasswing and restricted access

Inside the Mythos scaffold: Automated vulnerability discovery

Beyond Claude Mythos hype: Why you might not need a frontier model

Next steps: Evaluating Mythos and securing AI agents

You will also like

OpenClaw security vulnerabilities include data leakage and prompt injection risks

Phare LLM benchmark V2: Reasoning models don't guarantee better security

Risk assessment for LLMs and AI agents: OWASP, MITRE Atlas, and NIST AI RMF explained

Get AI security insights in your inbox

Claude Mythos: Analyzing Anthropic’s new frontier model for AI security

What is Claude Mythos?

Project Glasswing and restricted access

Inside the Mythos scaffold: Automated vulnerability discovery

Beyond Claude Mythos hype: Why you might not need a frontier model

Next steps: Evaluating Mythos and securing AI agents

You will also like

OpenClaw security vulnerabilities include data leakage and prompt injection risks

Phare LLM benchmark V2: Reasoning models don't guarantee better security

Risk assessment for LLMs and AI agents: OWASP, MITRE Atlas, and NIST AI RMF explained

Get AI security insights in your inbox

No vulnerabilities found? We refund the assessment.

No vulnerabilities found? 
We refund the assessment.