by Anthropic · Claude 5 family · best for hardest long-horizon work money can buy
The first publicly available Mythos-class model — a tier Anthropic explicitly positions above Opus. Fable 5 is the same underlying model as the restricted Claude Mythos 5, wrapped in AI-classifier safeguards that route sensitive sessions (cybersecurity, biology/chemistry, distillation attempts) to Opus 4.8. The numbers are a step change: SWE-bench Verified 95.0% and SWE-bench Pro 80.3% versus 88.6/69.2 for Opus 4.8, with state-of-the-art vision and million-token focus. Priced at $10/$50 per Mtok — double Opus — and included free on paid Claude plans until June 22, 2026.
| Benchmark | Score | Source |
|---|---|---|
| Terminal-Bench | 88% | kingy.ai 2026-06-09T00:00:00.000Z |
| SWE-bench Verified | 95% | truefoundry.com 2026-06-09T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“The first model where 'just give it the whole project' is a serious plan — if your data terms can live with 30-day retention.”
Fable 5 changes the build-vs-delegate calculus: Stripe's months-to-days migration report is the kind of result that reprices engineering roadmaps. Strategically you are buying into a two-tier Anthropic world — Fable for the hard slice, Opus 4.8 for everything else — and the fallback mechanism means you implicitly operate both. The hard governance gate is retention: no ZDR means regulated workloads may be structurally excluded, and the silent-fallback design needs to be disclosed in your own downstream commitments. Roadmap confidence is high; this is clearly where Anthropic is going.
“Anthropic just split the market into 'frontier' and 'Mythos' — and made every competitor's flagship look like the second tier.”
The category creation is the story: by naming a tier above Opus and shipping it publicly days after Opus 4.8, Anthropic reframed the frontier race on its own terms. The SWE-bench Pro gap (80.3 vs GPT-5.5's 58.6) is large enough to be a positioning moat, not a leaderboard footnote. The "powerful enough to be dangerous, safeguarded enough to sell" framing also lands a regulatory message competitors must now answer. Risk: the safety-fallback UX generates exactly the kind of viral complaint posts (biology questions answered by the wrong model) that erode a premium narrative.
“Double the rate card, a closing free window, and a fallback that bills premium for non-premium answers — model this carefully.”
At $10/$50, Fable must beat Opus 4.8 by more than 2x on task economics to win a line item — which it plausibly does on long-horizon engineering (fewer retries, fewer human hours) and plausibly does not on routine completions. The June 22 cliff matters: free-window usage patterns will not predict post-cliff bills. Two contract subtleties: flagged sessions are answered by Opus 4.8 — confirm how those bill — and 30-day retention may carry its own compliance cost. The right portfolio is a strict router: Fable for designated hard jobs, Opus/Sonnet/Haiku for the volume.
“It closes issues I'd budget a week for, and the notes-to-itself trick actually works — this is a different kind of tool.”
Hands-on, the difference from Opus 4.8 shows up on length: multi-hour agent runs hold coherence, the model genuinely uses its own scratch files to improve later steps, and SWE-bench Pro 80.3% translates to real multi-repo, underspecified-task wins. The API surface is unchanged (same SDKs, same scaffolds), so adoption is a model-id swap plus an effort-level decision. Practical frictions: sessions near security/bio topics silently downgrade — you learn to recognize the texture change — and the missing speed profile makes capacity planning guesswork for now.
“Free until June 22 makes this the best two weeks in AI — ask it the hardest thing you have and watch.”
For a daily heavy user the free window is the event: Mythos-class answers at no marginal cost on every paid plan. Quality on hard analytical, coding, and vision tasks is visibly above Opus 4.8 — it reads figures correctly, holds a whole project in its head, and self-corrects across a session. The trade-offs are pace (deliberate, not snappy) and the occasional silently-downgraded session on sensitive topics, which you notice as a sudden drop in depth. After June 22 the calculus changes: most users will reserve it for the questions that deserve it.
“SOTA where Anthropic chose to measure, silent model-swapping where it didn't, and a benchmark suite with the comparisons missing.”
Three structural cautions. First, disclosure is curated: SWE-bench and partner anecdotes (Stripe, Hebbia) are impressive but the academic suite — MMLU-Pro, AIME, GPQA — is absent, and several headline evals (FrontierCode Diamond, CursorBench 3.1, GDPval-AA) are too new to have independent baselines. Second, the silent fallback is a transparency problem dressed as a safety feature: users on flagged topics receive Opus 4.8 answers under the Fable label, and "less than 5% of sessions" is an average that concentrates in exactly the scientific domains Anthropic markets hardest. Third, the two-week Opus 4.8 → Fable 5 sequencing is a pricing ladder as much as a research milestone. The capability jump is real — external corroboration exists — but buy the measured model, not the launch copy.
The hardest 5% of work: multi-day autonomous engineering tasks, codebase-wide migrations, deep research synthesis over million-token corpora, vision-heavy document/figure analysis, and any job where one model-quality tier saves human-days. Route routine traffic to Opus 4.8 or below — Fable's economics only make sense where capability is the binding constraint.
A capability tier Anthropic places above Opus. Fable 5 and Mythos 5 are the same model; Fable adds AI-classifier safeguards, Mythos (unrestricted) is limited to vetted cyberdefense and research partners.
Sessions touching cybersecurity, biology/chemistry, or large-scale extraction patterns are answered by Claude Opus 4.8 instead, with a notice in supported surfaces. It averages under 5% of sessions but is concentrated in scientific domains and is acknowledged as overly broad at launch.
On Pro, Max, Team, and seat-based Enterprise plans it is included at no extra cost until June 22, 2026; afterwards it requires usage credits. API usage bills at $10/$50 from day one.
No. Mythos-class models require 30-day retention for safety monitoring; ZDR organizations will find it absent or disabled in model pickers.
Routine and latency-sensitive work, regulated workloads needing ZDR, and anywhere 2x price isn't justified — note Opus Fast Mode costs the same as Fable's base rate, an odd but real comparison for interactive use.
The engineering numbers (SWE-bench family) have external echoes and partner corroboration; the academic suite was undisclosed at launch, so run your own evals during the free window.
The mainstream flagship below it, at half the price — and literally the model that answers when Fable's classifiers fire. SWE-bench Pro 69.2 vs 80.3.
The strongest non-Anthropic flagship; led general-intelligence indexes in the Opus 4.7 era but trails badly on SWE-bench Pro (58.6) and has no Mythos-class answer yet.
Competitive on long-context and multimodal breadth (2M window); SWE-bench Pro 54.2 puts it well behind on the agentic-engineering axis Fable defines.
Does not train on API inputs by default
Last verified 2026-06-09