Claude Opus 4.8

GALatest Opus

by Anthropic · Claude 4 family · best for production agentic coding workhorse

FrontierReasoningCodingMultimodalLong-Context
9.2
AI Panel Score
Value 8.0/10

The last and best of the Opus 4 line: an incremental but real step over Opus 4.7 in software engineering (SWE-bench Verified 88.6%, SWE-bench Pro 69.2%), agentic tool use, and knowledge work, at an unchanged $5/$25 rate card with the same 1M-token context. Released 2026-05-28, twelve days before Fable 5 arrived above it — Opus 4.8 is now both Anthropic's mainstream flagship and the model Fable 5 silently falls back to when its safety classifiers fire. An optional Fast Mode trades ~2.5x speed for 2x price ($10/$50).

What's new

  • SWE-bench Verified 88.6% (up from 87.6% on 4.7); SWE-bench Pro 69.2% (up from 64.3%)
  • Terminal-Bench 2.1 at 74.6%; GPQA Diamond 93.6%; GDPval-AA Elo 1890 on knowledge work
  • Latency materially improved at standard effort (sub-second first token reported by trackers, vs the deliberate multi-second profile 4.7 showed at max effort)
  • Fast Mode: ~2.5x throughput at $10/$50 per Mtok, opt-in per request
  • Designated safety-fallback target for Claude Fable 5 (flagged cyber/bio/distillation sessions are answered by Opus 4.8)

Benchmarks

BenchmarkScoreSource
GPQA Diamond93.6%llm-stats.com 2026-05-28T00:00:00.000Z
Terminal-Bench74.6%llm-stats.com 2026-05-28T00:00:00.000Z
SWE-bench Verified88.6%llm-stats.com 2026-05-28T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker9/10
Opus 4.8 is the safe upgrade: same price, same API, better numbers — sign-off takes five minutes, not a committee.

As a platform decision this is the lowest-friction upgrade Anthropic has ever shipped: identical rate card, identical tokenizer, identical clouds, measurably better engineering output. The strategic wrinkle is Fable 5 sitting above it — Opus 4.8 is no longer the ceiling, so commitments here are really commitments to the Opus operational envelope (price, retention terms, latency) rather than to maximum capability. That is a reasonable place to standardize: Fable's 2x price and 30-day-retention requirement will be disqualifying for a slice of enterprises for some time. Vendor risk is unchanged and low.

Strategic Fit 9Vendor Risk 8Roadmap Confidence 9
Pros
  • Frictionless upgrade from 4.7
  • flat price
  • multi-cloud GA
  • clear role even after Fable
Cons
  • No longer the top tier
  • cutoff undisclosed
Right for: orgs standardizing on the proven Opus envelope
Avoid if: you need the absolute frontier and can absorb Fable's terms
Domain Strategist9/10
The quiet release before the loud one — 4.8's job is to be the dependable floor under the Mythos story.

Positioned twelve days before Fable 5, Opus 4.8 reads as deliberate sequencing: lock the mainstream tier at a strong baseline, then introduce the premium class. Its market role is now "the model you actually run in production" while Fable absorbs the headlines — and the fallback architecture makes that literal, since flagged Fable sessions are answered by 4.8. Against GPT-5.5 and Gemini 3.1 Pro it holds the agentic-coding lead of the line at unchanged unit economics, which keeps the Cursor/Copilot ecosystem anchored on Anthropic.

Competitive Positioning 9Differentiation 8Market Timing 9
Pros
  • Anchors the ecosystem under Fable
  • coding lead at mainstream price
Cons
  • Headline space ceded to Fable within days
Right for: strategies built on the dominant agentic-coding ecosystem
Avoid if: your wedge needs the Mythos-class capability story itself
Finance Lead8.5/10
Better output at the exact same line item — and no tokenizer surprise this time. That's a clean TCO win.

Four consecutive Opus releases at $5/$25 makes budgeting boring in the best way. Unlike the 4.6→4.7 transition there is no tokenizer change, so per-task cost falls wherever quality gains reduce retries. Cache and batch discounts are unchanged and deep. Fast Mode at $10/$50 needs ring-fencing exactly like 4.7's 6x mode did — note it now matches Fable 5's base price, which makes "Fast Opus vs standard Fable" a genuine procurement comparison for interactive workloads.

Cost Efficiency 8.5Pricing Transparency 9Value per Dollar 8.5
Pros
  • Flat rate card, real quality-per-dollar gain, no hidden cost shifts
Cons
  • Fast Mode pricing overlaps Fable base price
Right for: budget owners who want frontier output without renegotiating
Avoid if: your interactive traffic would push everything through Fast Mode anyway
Domain Practitioner9.5/10
Drop-in replacement, fewer broken diffs, faster first token — the upgrade PR is one line and it pays for itself.

For working engineers this is the ideal release shape: the model id changes, nothing else does. SWE-bench Pro +4.9 shows up in practice as fewer almost-right patches on multi-file tasks, and Terminal-Bench 74.6% tracks with sturdier long terminal sessions. The latency improvement at standard effort makes 4.8 usable in tighter loops where 4.7 forced Batch or Fast Mode. All the 4.x scaffolds — bash, text editor, computer use, Agent SDK — work unmodified. The missing cutoff disclosure is a minor annoyance when reasoning about library knowledge.

API Ergonomics 9.5Tool/Agent Support 10Reliability 9
Pros
  • One-line migration
  • real-world diff quality up
  • better interactive latency
Cons
  • Cutoff undisclosed
  • benchmark gaps complicate eval planning
Right for: teams shipping coding agents today
Avoid if: you need disclosed evals across the full academic suite
Power User9/10
It finally feels responsive at standard effort — Opus quality without scheduling your questions around the thinking pause.

The headline for heavy daily users is latency: sub-second first token at standard effort changes how often you reach for Opus instead of Sonnet. Output quality is the best of the 4.x line on hard analytical and coding questions, vision remains strong on screenshots and PDFs, and refusal calibration carries over. Long sessions in the 1M window stay coherent. Fable 5 is better still — but costs extra credits after June 22 and can silently hand your session to... this model, which rather proves 4.8 is the dependable choice.

Output Quality 9Speed 8Everyday Usefulness 9
Pros
  • Responsive at standard effort
  • top-tier answers
  • strong vision
Cons
  • Still not instant
  • Fable exists if you must have the best
Right for: daily drivers who want frontier quality without premium pricing
Avoid if: sub-200ms chat snappiness is non-negotiable
Skeptic8/10
A genuinely better 4.7 — but the two-week shelf life as 'flagship' tells you exactly how Anthropic sees it.

The gains are real and the sourcing is independent (llm-stats corroborates the SWE numbers), so this is not a paper release. The skeptical reads: first, disclosure remains selective — no AIME, no MMLU-Pro, no arena Elo at launch, so "improved reasoning" rests partly on vendor framing. Second, the release timing makes 4.8 look like infrastructure for the Fable launch — the fallback model needed to be strong enough that safety-routed sessions don't feel like punishment — rather than a destination in itself. Third, GDPval-AA and Terminal-Bench 2.1 are young benchmarks with thin comparison sets. None of this undermines the practical upgrade; it does mean "frontier" now belongs to a different price tier.

Claim Accuracy 8Weakness Severity 7.5Hype vs Reality 8
Pros
  • Verifiable engineering gains
  • honest pricing continuity
Cons
  • Selective disclosure
  • flagship status lasted twelve days
Right for: skeptics who want the proven tier, not the story
Avoid if: you mistake it for Anthropic's best — that's Fable now

Strengths

  • Best-in-class production coding/agentic numbers below the Mythos tier
  • Unchanged price with improved capability — straight capability-per-dollar win over 4.7
  • Same tokenizer and API surface as 4.7: zero-friction migration
  • Multi-cloud GA from day one; mature compliance story

Limitations

  • Overshadowed within two weeks by Fable 5, which beats it decisively on SWE-bench Pro (80.3 vs 69.2)
  • Benchmark disclosure remains selective (no AIME/MMLU-Pro/LMArena at launch)
  • Knowledge cutoff undisclosed at launch
  • Still overkill for routine traffic better served by Sonnet 4.6 or Haiku 4.5

Best use cases

Production agentic coding and long-horizon tool-use pipelines that want frontier-class results at the established Opus price, teams not ready for Fable 5's 2x cost or 30-day-retention requirement, and any workload that needs the proven 4.x operational envelope (caching, batch, multi-cloud).

Buyer questions

Should I upgrade from Opus 4.7?

Yes — same price, same tokenizer, same API, better results. This is the rare upgrade with no modeled downside; re-run task-level evals only if you depend on exact output formats.

How does it relate to Fable 5?

Fable is the new top tier at $10/$50; Opus 4.8 is the mainstream flagship and also the model Fable falls back to when its safety classifiers trigger (under 5% of sessions).

Is Fast Mode worth it?

At $10/$50 it matches Fable 5's base price — for interactive workloads compare "Fast Opus 4.8" against "standard Fable 5" directly before choosing.

What's the knowledge cutoff?

Anthropic hadn't disclosed it at launch; assume early-2026 and verify recency-sensitive outputs with web search enabled.

Which clouds serve it?

First-party API, Amazon Bedrock, Google Vertex AI, and Azure AI Foundry, with regional data-residency options.

Does it train on my data?

No — API inputs are not used for training by default; standard Anthropic enterprise terms apply.

Comparable models

Claude Fable 5Anthropic

The tier above — SWE-bench Verified 95.0 vs 88.6 and Pro 80.3 vs 69.2, at 2x the price with 30-day retention required and classifier fallback (to Opus 4.8) on sensitive topics.

Claude Opus 4.7: The direct predecessor; 4.8 is strictly better at the same price with the same tokenizer — there is no reason to start new work on 4.7.
GPT-5.5OpenAI

Stronger on general-intelligence indexes; behind on SWE-bench Pro (58.6 vs 69.2) and the agentic-coding ecosystem.

Model specs

Input price
$5 / Mtok
Output price
$25 / Mtok
Cached input
$0.50 / Mtok
Batch (in/out)
$2.50 / $12.50
Context window
1M tokens
Max output
128K tokens
Knowledge cutoff
Undisclosed
Released
2026-05-27
Modalities
text, image → text
Output speed
~60 tok/s
License
Proprietary
Clouds
Bedrock, Vertex AI, Azure AI Foundry

Does not train on API inputs by default

Last verified 2026-06-09