by Anthropic · Claude 4 family · best for production agentic coding workhorse
The last and best of the Opus 4 line: an incremental but real step over Opus 4.7 in software engineering (SWE-bench Verified 88.6%, SWE-bench Pro 69.2%), agentic tool use, and knowledge work, at an unchanged $5/$25 rate card with the same 1M-token context. Released 2026-05-28, twelve days before Fable 5 arrived above it — Opus 4.8 is now both Anthropic's mainstream flagship and the model Fable 5 silently falls back to when its safety classifiers fire. An optional Fast Mode trades ~2.5x speed for 2x price ($10/$50).
| Benchmark | Score | Source |
|---|---|---|
| GPQA Diamond | 93.6% | llm-stats.com 2026-05-28T00:00:00.000Z |
| Terminal-Bench | 74.6% | llm-stats.com 2026-05-28T00:00:00.000Z |
| SWE-bench Verified | 88.6% | llm-stats.com 2026-05-28T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“Opus 4.8 is the safe upgrade: same price, same API, better numbers — sign-off takes five minutes, not a committee.”
As a platform decision this is the lowest-friction upgrade Anthropic has ever shipped: identical rate card, identical tokenizer, identical clouds, measurably better engineering output. The strategic wrinkle is Fable 5 sitting above it — Opus 4.8 is no longer the ceiling, so commitments here are really commitments to the Opus operational envelope (price, retention terms, latency) rather than to maximum capability. That is a reasonable place to standardize: Fable's 2x price and 30-day-retention requirement will be disqualifying for a slice of enterprises for some time. Vendor risk is unchanged and low.
“The quiet release before the loud one — 4.8's job is to be the dependable floor under the Mythos story.”
Positioned twelve days before Fable 5, Opus 4.8 reads as deliberate sequencing: lock the mainstream tier at a strong baseline, then introduce the premium class. Its market role is now "the model you actually run in production" while Fable absorbs the headlines — and the fallback architecture makes that literal, since flagged Fable sessions are answered by 4.8. Against GPT-5.5 and Gemini 3.1 Pro it holds the agentic-coding lead of the line at unchanged unit economics, which keeps the Cursor/Copilot ecosystem anchored on Anthropic.
“Better output at the exact same line item — and no tokenizer surprise this time. That's a clean TCO win.”
Four consecutive Opus releases at $5/$25 makes budgeting boring in the best way. Unlike the 4.6→4.7 transition there is no tokenizer change, so per-task cost falls wherever quality gains reduce retries. Cache and batch discounts are unchanged and deep. Fast Mode at $10/$50 needs ring-fencing exactly like 4.7's 6x mode did — note it now matches Fable 5's base price, which makes "Fast Opus vs standard Fable" a genuine procurement comparison for interactive workloads.
“Drop-in replacement, fewer broken diffs, faster first token — the upgrade PR is one line and it pays for itself.”
For working engineers this is the ideal release shape: the model id changes, nothing else does. SWE-bench Pro +4.9 shows up in practice as fewer almost-right patches on multi-file tasks, and Terminal-Bench 74.6% tracks with sturdier long terminal sessions. The latency improvement at standard effort makes 4.8 usable in tighter loops where 4.7 forced Batch or Fast Mode. All the 4.x scaffolds — bash, text editor, computer use, Agent SDK — work unmodified. The missing cutoff disclosure is a minor annoyance when reasoning about library knowledge.
“It finally feels responsive at standard effort — Opus quality without scheduling your questions around the thinking pause.”
The headline for heavy daily users is latency: sub-second first token at standard effort changes how often you reach for Opus instead of Sonnet. Output quality is the best of the 4.x line on hard analytical and coding questions, vision remains strong on screenshots and PDFs, and refusal calibration carries over. Long sessions in the 1M window stay coherent. Fable 5 is better still — but costs extra credits after June 22 and can silently hand your session to... this model, which rather proves 4.8 is the dependable choice.
“A genuinely better 4.7 — but the two-week shelf life as 'flagship' tells you exactly how Anthropic sees it.”
The gains are real and the sourcing is independent (llm-stats corroborates the SWE numbers), so this is not a paper release. The skeptical reads: first, disclosure remains selective — no AIME, no MMLU-Pro, no arena Elo at launch, so "improved reasoning" rests partly on vendor framing. Second, the release timing makes 4.8 look like infrastructure for the Fable launch — the fallback model needed to be strong enough that safety-routed sessions don't feel like punishment — rather than a destination in itself. Third, GDPval-AA and Terminal-Bench 2.1 are young benchmarks with thin comparison sets. None of this undermines the practical upgrade; it does mean "frontier" now belongs to a different price tier.
Production agentic coding and long-horizon tool-use pipelines that want frontier-class results at the established Opus price, teams not ready for Fable 5's 2x cost or 30-day-retention requirement, and any workload that needs the proven 4.x operational envelope (caching, batch, multi-cloud).
Yes — same price, same tokenizer, same API, better results. This is the rare upgrade with no modeled downside; re-run task-level evals only if you depend on exact output formats.
Fable is the new top tier at $10/$50; Opus 4.8 is the mainstream flagship and also the model Fable falls back to when its safety classifiers trigger (under 5% of sessions).
At $10/$50 it matches Fable 5's base price — for interactive workloads compare "Fast Opus 4.8" against "standard Fable 5" directly before choosing.
Anthropic hadn't disclosed it at launch; assume early-2026 and verify recency-sensitive outputs with web search enabled.
First-party API, Amazon Bedrock, Google Vertex AI, and Azure AI Foundry, with regional data-residency options.
No — API inputs are not used for training by default; standard Anthropic enterprise terms apply.
The tier above — SWE-bench Verified 95.0 vs 88.6 and Pro 80.3 vs 69.2, at 2x the price with 30-day retention required and classifier fallback (to Opus 4.8) on sensitive topics.
Stronger on general-intelligence indexes; behind on SWE-bench Pro (58.6 vs 69.2) and the agentic-coding ecosystem.
Does not train on API inputs by default
Last verified 2026-06-09