ML6 • Blog

Anthropic’s Fable 5 is out. What to do now?

Written by Cristóbal Sendín | Jun 11, 2026 1:57:14 PM

Executive summary
Fable 5 delivers a clear capability jump over Opus 4.8 for complex, high-stakes work such as deep codebase reviews, security-sensitive implementation, and long-context reasoning, but it is slower, more expensive, and not justified for routine tasks. ML6’s internal test showed that Fable 5 found release-blocking issues that Opus 4.8 missed, though at roughly 3x the time and 3.4x the cost. The main adoption risk is Anthropic’s mandatory 30-day data retention policy, with no enterprise zero-retention option. Enterprise teams should therefore use Fable 5 selectively, keep cheaper models for standard work, and evaluate governance, safeguards, orchestration, and budget controls before adoption.

On June 9, 2026, Anthropic released Claude Fable 5 and Claude Mythos 5, both built on the same underlying model. They represent a new "Mythos-class" tier that sits above the Opus family in capability.

The introduction of this model has sparked extensive dialogue within the community, as researchers and developers debate the magnitude of this advancement. We tested Fable 5 against Opus 4.8 on a real internal codebase and reviewed the broader community response. This post covers where the performance gain justifies the cost, where it doesn't, and what you need to evaluate before adopting it.

"Fable 5 approaches tasks with greater structure and depth than previous iterations, marking a significant leap forward for high-critical, codebase-wide tasks."

What are Fable 5 and Mythos 5?

The key distinction between them is not the model itself but the safeguards layered on top:

  • Fable 5 is the general-availability release, an official production-ready release available for general use. It also includes safety classifiers that monitor every request. When the model detects queries related to cybersecurity exploitation, biology, chemistry, or model distillation (the process of training a smaller model to replicate a larger one's capabilities), the response is automatically handled by Claude Opus 4.8 instead.
  • Mythos 5 is the same model with certain safeguards lifted, but access is limited exclusively to organizations participating in Project Glasswing, Anthropic's trusted-access program for cyber defense and infrastructure protection, and will soon extend to select biology researchers.

Both models share a 1-million-token context window. This means both models can process up to around 1 million tokens of input and conversation history at once.
Moreover, they can process up to 128,000 output tokens per request and support vision, tool use, memory, compaction, and adaptive thinking. Pricing sits at $10 per million input tokens and $50 per million output tokens, roughly double the cost of Opus 4.8.

When does Fable 5 earn its cost?

Fable 5 is not a drop-in replacement for Opus 4.8 across every workflow. It is significantly more expensive, slower, and most differentiated on long, complex, high-stakes tasks. The practical question for teams is: where does the performance gain justify the price?

Where Fable 5 pulls ahead: The model excels at tasks that reward sustained focus and iteration: multi-file code migrations, deep code review, security-sensitive implementations, long-context analysis, and autonomous multi-step workflows. On SWE-Bench Pro, Anthropic's agentic coding benchmark, Fable 5 scores 80.3% compared to 69.2% for Opus 4.8.

Independent evaluations have corroborated these results. Every's Senior Engineer benchmark scored Fable 5 at 91/100, making it the first frontier model to reach 90+. Simon Willison reported a similarly strong experience on a cross-repo task for his Datasette Agent project: Fable first implemented workarounds, then, once given access to the underlying library, refactored them into supported features with tests and documentation. His summary: it felt like several days of work compressed into a few hours.

 

Where the difference is marginal: For quick edits, short prompts, routine code generation, and standard conversational tasks, the performance gap narrows. Multiple reviewers converge on the same practical advice: use Fable 5 for large, delegable jobs you can hand off and check later; keep a faster, cheaper model for rapid back-and-forth.

The latency tradeoff: Fable 5 takes meaningfully longer to complete tasks. In CodeRabbit's evaluation, many coding tasks ran long enough to hit their agent timeout. This is consistent with what we observed in our own testing  (see the internal toolcase study below). The depth is real, but so is the time cost.

Where it can fall short: Fable 5 is not uniformly better across all task types. CodeRabbit found that on their hardest code review problems, Fable 5 actually passed fewer cases than both Opus 4.8 and their baseline. The model's strength is in sustained, structured investigation rather than rapid pattern-matching on isolated issues.

Our experience with testing Fable

To get a hands-on sense of Fable 5 in a real-world context, we ran a head-to-head comparison as part of preparing an internal tool release. The goal was straightforward: “Can Fable 5 catch more release-blocking issues than Opus 4.8 on an actual codebase?”

Both models received the same deliberately vague prompt, asking them to review the codebase ahead of release, patch any vulnerabilities or bugs, and provide insights on a pending integration branch.

Fable 5 surfaced deeper, more structural risks, including a credential-prompt hang mechanism and a dependency regression, both of which were release-blocking. It browsed the codebase more thoroughly and produced a longer, more organized list of findings. At least one of its suggested changes was incorporated directly and had been missed during the development process.

On the resource side, Fable 5 required approximately 30 minutes to complete the task, versus roughly 10 minutes for Opus 4.8. It consumed approximately 5.74 million tokens compared to 2.54 million for Opus 4.8. The Fable 5 run cost approximately $12.38, while the Opus 4.8 run cost approximately $3.65, making Fable 5 roughly 3.4x more expensive on this specific experiment.

Those raw cost figures deserve context. Fable 5 completed the task largely in one shot. Reaching a comparable result with Opus 4.8 would likely have required additional prompting and guidance. Developer Fabio Jonathan on the AI Daily Brief Podcast captured this well, arguing that Fable 5 is "cheaper than Opus in practice" because it "one-shots way more often." For complex, codebase-wide reasoning tasks, the total effort to reach a satisfactory outcome matters more than the per-run cost.

That said, Fable's appetite for context scales fast on open-ended sessions. In the aforementioned Willison’s post, he tracked a full day of usage and spent $110 in tokens, with a single extended coding session consuming 78 million tokens on its own. Budget controls and session scoping matter more with this model than with any previous Claude release.

The bottom line: match the model to the task. Fable 5 approaches tasks with greater structure and depth than previous iterations. However, given its longer processing time and higher cost, it is best reserved for high-criticality tasks. For tasks that still require strong reasoning but are more structured, other frontier models are the more practical choice. And for straightforward, well-scoped instructions, lighter models handle the job at a fraction of the cost and latency.

Data retention: a potential blocker

The most operationally significant change with Mythos-class models is Anthropic's new 30-day data retention policy. All prompts submitted to, and outputs generated by, Fable 5 and Mythos 5 are retained for 30 days for trust and safety purposes. If content is flagged by Anthropic's safety classifiers as violating usage policy, retention extends to up to two years.

This policy applies across every platform where the models are offered. There is no configuration toggle, no platform exemption, and no enterprise carve-out. Previous enterprise-tier models typically offered zero-retention guarantees, making this a significant departure.

The real-world impact is already visible. Within 24 hours of launch, Microsoft restricted employee access to Fable 5 due to concerns that proprietary data was being stored on Anthropic's infrastructure. Their legal teams are evaluating whether the retention terms are compatible with internal data governance requirements.

Anthropic states that retained data will not be used for training and is scoped to trust and safety analysis only. Eligible organizations can add customer-managed encryption keys and access transparency audit logs. These controls are meaningful, but they are not equivalent to zero retention. Legal analysts have also raised concerns about implications for attorney-client privilege and work-product protection, since the policy introduces the possibility of human review.

Before adopting Fable 5, teams should evaluate whether the 30-day retention (and the potential 2-year retention for flagged content) aligns with their data governance requirements and client agreements. For projects involving sensitive or regulated data, this may be a blocking factor regardless of the model's capabilities.

Frontier research restrictions: what changed?

On release, Anthropic included safeguards that block queries related to frontier LLM development research, cybersecurity exploitation, and biology/chemistry. Flagged requests were silently rerouted to Opus 4.8 without any indication to the user, leaving no way to know when a response came from a less capable model. For organizations whose work touches foundational AI research, this also introduced a practical limitation on certain lines of inquiry.

On June 11, Anthropic acknowledged the silent fallback was the wrong tradeoff and announced that flagged requests will now visibly fall back to Opus 4.8 across all restricted categories. On the API, flagged requests will return a reason for refusal, with server-side fallback coming shortly. The trade-off is that visible safeguards are easier to probe, so Anthropic expects more false positives in the short term while the classifiers are tuned. Users can report mistaken flags through Claude Code's /feedback command, the thumbs-down button in Claude.ai, or a safeguard appeal form for API requests.

The cybersecurity angle

Fable 5 and Mythos 5 build on Mythos Preview, which Anthropic released in April 2026 through Project Glasswing. That model was tested against thousands of open-source projects and surfaced previously unknown vulnerabilities in Mozilla Firefox and Cloudflare, including one that had gone undetected for over 25 years.

Independent evaluations by the UK's AISI, XBOW, and others have since confirmed a broader trend: frontier models now exceed predicted baselines for expert-level cyber attacks, can sustain entire attack campaigns, and discover unanticipated attack paths, especially when given access to live systems and extended execution time.

However, raw model capability is only part of the picture. As Hadrian's research highlights, the real multiplier is the ability to specialize (by building repeatable workflows for tasks such as vulnerability discovery or credential analysis) and to parallelize (by running multiple agents simultaneously across thousands of targets). The competitive advantage is shifting from models to systems.

For organizations, this cuts both ways. AI-assisted security testing is being commoditized, meaning tasks that once required specialist expertise can now be automated at scale. But the same capabilities also strengthen the defensive side: code review, vulnerability triage, threat hunting, and incident response all benefit from these model improvements.

Our 5 recommendations

  1. Use Fable 5 selectively for high-criticality work. Pre-release security reviews, complex code investigations, and long-context analysis are where the depth justifies the cost and latency. It is not a default replacement for other frontier models in day-to-day workflows.
  2. Assess the data retention policy before adoption. The 30-day mandatory retention with no enterprise exemption is a potential blocker for sensitive projects and regulated industries. Evaluate compatibility with your data governance framework and client agreements early.
  3. Account for safeguard fallbacks in sensitive domains. If your work touches cybersecurity, biology, chemistry, or AI research, expect some requests to be rerouted to Opus 4.8. Monitor for visible fallback indicators and use the reporting channels to flag false positives.
  4. Evaluate systems, not just models. Tooling, memory, execution time, and orchestration increasingly influence outcomes. A well-integrated system using a slightly less capable model may outperform a frontier model used in isolation.
  5. Expect AI-assisted security testing to scale. As capabilities, specialization, and parallelization improve, both internal security teams and external researchers will conduct assessments more efficiently and at a larger scale. This has implications for how organizations plan their defensive posture.

If you are evaluating Fable 5 for your team or building AI-assisted security workflows, we can help you design the right model routing strategy for your use case. Reach out to our team to discuss how ML6 approaches adopting frontier models for enterprise clients.

 

Sources