Claude Fable 5

GALatest Fable

by Anthropic · Claude 5 family · best for hardest long-horizon work money can buy

FrontierReasoningCodingMultimodalLong-Context
9.5
AI Panel Score
Value 6.5/10

The first publicly available Mythos-class model — a tier Anthropic explicitly positions above Opus. Fable 5 is the same underlying model as the restricted Claude Mythos 5, wrapped in AI-classifier safeguards that route sensitive sessions (cybersecurity, biology/chemistry, distillation attempts) to Opus 4.8. The numbers are a step change: SWE-bench Verified 95.0% and SWE-bench Pro 80.3% versus 88.6/69.2 for Opus 4.8, with state-of-the-art vision and million-token focus. Priced at $10/$50 per Mtok — double Opus — and included free on paid Claude plans until June 22, 2026.

What's new

  • New "Mythos" model class above Opus; Fable 5 = Mythos 5 + safeguards
  • SWE-bench Verified 95.0%, SWE-bench Pro 80.3% (next best frontier: 69.2 Opus 4.8, 58.6 GPT-5.5)
  • Terminal-Bench 2.1 at 88.0%; FrontierCode Diamond 29.3% (highest among frontier models); CursorBench 3.1 at 72.9% (max effort)
  • State-of-the-art vision: rebuilds working web apps from screenshots; completed Pokémon FireRed on a vision-only harness
  • Long-horizon autonomy: stays focused across millions of tokens, improves its own outputs using persistent notes (3x better Slay the Spire completion with file memory vs Opus 4.8)
  • Safety architecture is novel: classifier-triggered silent fallback to Opus 4.8 instead of refusals (<5% of sessions); 30-day data retention required, no zero-data-retention option

Benchmarks

BenchmarkScoreSource
Terminal-Bench88%kingy.ai 2026-06-09T00:00:00.000Z
SWE-bench Verified95%truefoundry.com 2026-06-09T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker9.5/10
The first model where 'just give it the whole project' is a serious plan — if your data terms can live with 30-day retention.

Fable 5 changes the build-vs-delegate calculus: Stripe's months-to-days migration report is the kind of result that reprices engineering roadmaps. Strategically you are buying into a two-tier Anthropic world — Fable for the hard slice, Opus 4.8 for everything else — and the fallback mechanism means you implicitly operate both. The hard governance gate is retention: no ZDR means regulated workloads may be structurally excluded, and the silent-fallback design needs to be disclosed in your own downstream commitments. Roadmap confidence is high; this is clearly where Anthropic is going.

Strategic Fit 9.5Vendor Risk 7Roadmap Confidence 9.5
Pros
  • Step-change capability
  • day-one multi-cloud
  • clear tiering strategy
Cons
  • Retention requirement
  • silent fallback complicates compliance narratives
Right for: orgs whose hardest work is the bottleneck
Avoid if: ZDR or full output provenance is contractual
Domain Strategist9.5/10
Anthropic just split the market into 'frontier' and 'Mythos' — and made every competitor's flagship look like the second tier.

The category creation is the story: by naming a tier above Opus and shipping it publicly days after Opus 4.8, Anthropic reframed the frontier race on its own terms. The SWE-bench Pro gap (80.3 vs GPT-5.5's 58.6) is large enough to be a positioning moat, not a leaderboard footnote. The "powerful enough to be dangerous, safeguarded enough to sell" framing also lands a regulatory message competitors must now answer. Risk: the safety-fallback UX generates exactly the kind of viral complaint posts (biology questions answered by the wrong model) that erode a premium narrative.

Competitive Positioning 10Differentiation 9.5Market Timing 9
Pros
  • Tier-defining launch
  • verifiable capability gap
  • strong enterprise distribution
Cons
  • Fallback friction is a narrative liability
Right for: bets on the agentic-autonomy wave
Avoid if: your positioning needs predictable, uniform model behavior
Finance Lead7/10
Double the rate card, a closing free window, and a fallback that bills premium for non-premium answers — model this carefully.

At $10/$50, Fable must beat Opus 4.8 by more than 2x on task economics to win a line item — which it plausibly does on long-horizon engineering (fewer retries, fewer human hours) and plausibly does not on routine completions. The June 22 cliff matters: free-window usage patterns will not predict post-cliff bills. Two contract subtleties: flagged sessions are answered by Opus 4.8 — confirm how those bill — and 30-day retention may carry its own compliance cost. The right portfolio is a strict router: Fable for designated hard jobs, Opus/Sonnet/Haiku for the volume.

Cost Efficiency 6Pricing Transparency 7Value per Dollar 7
Pros
  • Genuine task-level ROI on hard work
  • batch/cache discounts apply
Cons
  • 2x rate card
  • free-window distortion
  • fallback billing ambiguity
Right for: budgets with a defined "hardest work" bucket
Avoid if: spend is dominated by high-volume routine calls
Domain Practitioner9.5/10
It closes issues I'd budget a week for, and the notes-to-itself trick actually works — this is a different kind of tool.

Hands-on, the difference from Opus 4.8 shows up on length: multi-hour agent runs hold coherence, the model genuinely uses its own scratch files to improve later steps, and SWE-bench Pro 80.3% translates to real multi-repo, underspecified-task wins. The API surface is unchanged (same SDKs, same scaffolds), so adoption is a model-id swap plus an effort-level decision. Practical frictions: sessions near security/bio topics silently downgrade — you learn to recognize the texture change — and the missing speed profile makes capacity planning guesswork for now.

API Ergonomics 9Tool/Agent Support 10Reliability 8.5
Pros
  • Long-horizon coherence
  • unchanged tooling
  • memory behavior is real
Cons
  • Silent fallback texture-shifts
  • no speed disclosures yet
Right for: builders running autonomous multi-hour agents
Avoid if: you need deterministic single-model behavior for evals
Power User9/10
Free until June 22 makes this the best two weeks in AI — ask it the hardest thing you have and watch.

For a daily heavy user the free window is the event: Mythos-class answers at no marginal cost on every paid plan. Quality on hard analytical, coding, and vision tasks is visibly above Opus 4.8 — it reads figures correctly, holds a whole project in its head, and self-corrects across a session. The trade-offs are pace (deliberate, not snappy) and the occasional silently-downgraded session on sensitive topics, which you notice as a sudden drop in depth. After June 22 the calculus changes: most users will reserve it for the questions that deserve it.

Output Quality 10Speed 6Everyday Usefulness 8.5
Pros
  • Best available answers
  • free window
  • vision is startling
Cons
  • Deliberate pace
  • credits after June 22
  • fallback surprises
Right for: power users with genuinely hard problems
Avoid if: you mostly need fast, light replies
Skeptic7.5/10
SOTA where Anthropic chose to measure, silent model-swapping where it didn't, and a benchmark suite with the comparisons missing.

Three structural cautions. First, disclosure is curated: SWE-bench and partner anecdotes (Stripe, Hebbia) are impressive but the academic suite — MMLU-Pro, AIME, GPQA — is absent, and several headline evals (FrontierCode Diamond, CursorBench 3.1, GDPval-AA) are too new to have independent baselines. Second, the silent fallback is a transparency problem dressed as a safety feature: users on flagged topics receive Opus 4.8 answers under the Fable label, and "less than 5% of sessions" is an average that concentrates in exactly the scientific domains Anthropic markets hardest. Third, the two-week Opus 4.8 → Fable 5 sequencing is a pricing ladder as much as a research milestone. The capability jump is real — external corroboration exists — but buy the measured model, not the launch copy.

Claim Accuracy 7.5Weakness Severity 7Hype vs Reality 7.5
Pros
  • Engineering gains independently echoed
  • safety work is documented
Cons
  • Curated benchmarks
  • silent substitution
  • premium-ladder timing
Right for: skeptics willing to run their own evals in the free window
Avoid if: you need full disclosure before commitment

Strengths

  • Largest single-generation capability jump Anthropic has shipped — SWE-bench Pro +11 points over its own two-week-old flagship
  • Long-horizon task persistence with self-managed memory is a qualitatively new tier
  • State-of-the-art vision and knowledge-work performance with external corroboration (Stripe, Hebbia, IMC)
  • Multi-cloud and Copilot availability from day one

Limitations

  • 2x Opus pricing; free-plan window closes 2026-06-22
  • Safety fallback is silent and conservative — biology/security workloads may get Opus 4.8 answers without asking for them
  • Mandatory 30-day retention excludes ZDR-bound organizations outright
  • No academic benchmark suite, speed profile, or knowledge cutoff disclosed at launch
  • Latency unsuited to snappy interactive UX

Best use cases

The hardest 5% of work: multi-day autonomous engineering tasks, codebase-wide migrations, deep research synthesis over million-token corpora, vision-heavy document/figure analysis, and any job where one model-quality tier saves human-days. Route routine traffic to Opus 4.8 or below — Fable's economics only make sense where capability is the binding constraint.

Buyer questions

What does "Mythos-class" actually mean?

A capability tier Anthropic places above Opus. Fable 5 and Mythos 5 are the same model; Fable adds AI-classifier safeguards, Mythos (unrestricted) is limited to vetted cyberdefense and research partners.

What happens on the safety fallback?

Sessions touching cybersecurity, biology/chemistry, or large-scale extraction patterns are answered by Claude Opus 4.8 instead, with a notice in supported surfaces. It averages under 5% of sessions but is concentrated in scientific domains and is acknowledged as overly broad at launch.

Is it really free right now?

On Pro, Max, Team, and seat-based Enterprise plans it is included at no extra cost until June 22, 2026; afterwards it requires usage credits. API usage bills at $10/$50 from day one.

Can my org use it under zero data retention?

No. Mythos-class models require 30-day retention for safety monitoring; ZDR organizations will find it absent or disabled in model pickers.

When should I use Opus 4.8 instead?

Routine and latency-sensitive work, regulated workloads needing ZDR, and anywhere 2x price isn't justified — note Opus Fast Mode costs the same as Fable's base rate, an odd but real comparison for interactive use.

Are the benchmarks trustworthy?

The engineering numbers (SWE-bench family) have external echoes and partner corroboration; the academic suite was undisclosed at launch, so run your own evals during the free window.

Comparable models

Claude Opus 4.8Anthropic

The mainstream flagship below it, at half the price — and literally the model that answers when Fable's classifiers fire. SWE-bench Pro 69.2 vs 80.3.

GPT-5.5OpenAI

The strongest non-Anthropic flagship; led general-intelligence indexes in the Opus 4.7 era but trails badly on SWE-bench Pro (58.6) and has no Mythos-class answer yet.

Gemini 3.1 ProGoogle

Competitive on long-context and multimodal breadth (2M window); SWE-bench Pro 54.2 puts it well behind on the agentic-engineering axis Fable defines.

Model specs

Input price
$10 / Mtok
Output price
$50 / Mtok
Cached input
$1 / Mtok
Batch (in/out)
$5 / $25
Context window
1M tokens
Max output
128K tokens
Knowledge cutoff
Undisclosed
Released
2026-06-08
Modalities
text, image, file → text
Output speed
Not profiled
License
Proprietary
Clouds
Bedrock, Vertex AI, Azure AI Foundry

Does not train on API inputs by default

Last verified 2026-06-09