Claude Opus 4.6

GA

by Anthropic · Claude 4 family · best for prior frontier, stable production target

FrontierReasoningCodingMultimodalLong-Context
8.6
AI Panel Score
Value 7.8/10

Claude Opus 4.6 was Anthropic's flagship from February 5, 2026 until Opus 4.7 superseded it in April. It remains fully supported and widely deployed: a frontier all-rounder with SWE-bench Verified 80.8%, GPQA Diamond 91.3%, a 1M-token context at standard pricing, and the distinction of a stable tokenizer that Opus 4.7's new tokenizer broke. For a buyer, the single sentence is this: a still-excellent frontier model whose main remaining advantage over 4.7 is prompt/tokenizer stability and the fact that identical text can cost less than on 4.7. - Provider: Anthropic - Released: 2026-02-05 - Status: GA (legacy in Anthropic docs — superseded by Opus 4.7 but fully supported) - Context window: 1,000,000 tokens - Max output: 128,000 tokens (300k on Batch API beta) - Modalities: text, image - Knowledge cutoff: May 2025 reliable (training cutoff August 2025) - Headline price: $5 input / $25 output per 1M tokens

What's new

  • vs Opus 4.5: 1M-token context window at standard pricing (4.5 was 200k).
  • Adaptive thinking introduced at the Opus tier, plus four explicit effort levels.
  • Context compaction for sustained agentic tasks and agent teams in Claude Code.
  • Topped the Artificial Analysis Intelligence Index at release (53), Anthropic's first #1 on that index.
  • ARC-AGI-2 jumped to 68.8% (from 37.6% on Opus 4.5).
  • Held Opus pricing at $5/$25 — the rate set by Opus 4.5 in November 2025.

Benchmarks

BenchmarkScoreSource
Humanity's Last Exam53.1%vellum.ai 2026-02-05T00:00:00.000Z
MMMU77.3%vellum.ai 2026-02-05T00:00:00.000Z
MMLU-Pro88.3%vellum.ai 2026-02-05T00:00:00.000Z
AIME 202585%datacamp.com 2026-02-05T00:00:00.000Z
HumanEval95%morphllm.com 2026-02-05T00:00:00.000Z
TAU-bench91.9%vellum.ai 2026-02-05T00:00:00.000Z
LMArena Elo1490openlm.ai 2026-05-28T00:00:00.000Z
GPQA Diamond91.3%vellum.ai 2026-02-05T00:00:00.000Z
Terminal-Bench65.4%vellum.ai 2026-02-05T00:00:00.000Z
MRCR Long Context76%morphllm.com 2026-02-05T00:00:00.000Z
LMArena Coding Elo1535openlm.ai 2026-05-28T00:00:00.000Z
SWE-bench Verified80.8%vellum.ai 2026-02-05T00:00:00.000Z
Artificial Analysis Index53artificialanalysis.ai 2026-02-05T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker8.5/10
Opus 4.6 is a solid production target, but the strategic question is now migration timing, not continued use.

Opus 4.6 remains a strong production model, and Anthropic's own docs recommend migrating to 4.7. Pricing is identical, infra fit is unchanged, and the capability lift on 4.7 is real on agentic coding. The argument for staying is stability: Opus 4.7's new tokenizer and more-literal instruction following can shift behavior on tuned prompt suites, and 4.6's tokenizer keeps cost models intact. For most buyers the right call is a controlled migration over a quarter rather than treating 4.6 as a long-term destination.

Strategic Fit 8Vendor Risk 7Roadmap Confidence 8
Pros
  • Frontier, stable tokenizer, multi-cloud, flat price
Cons
  • Superseded by 4.7
  • migrate eventually
Right for: existing 4.6 production with tuned prompts
Avoid if: starting a new build (use 4.7)
Domain Strategist8.5/10
Opus 4.6 was the model that briefly gave Anthropic the #1 intelligence-index slot — its legacy is the price-parity ladder.

Opus 4.6's market role was to take the top of the Artificial Analysis Index for the first time while holding the $5/$25 price Opus 4.5 had set, proving Anthropic could lead on raw intelligence without raising prices. That positioning compounded ecosystem trust. Today its differentiation is mostly historical: 4.7 owns the coding narrative and GPT-5.5 leads the index. Its remaining strategic value is as the stable rung on a price-parity ladder that lets teams adopt Opus capability without repeated cost renegotiation.

Competitive Positioning 8Differentiation 7Market Timing 8
Pros
  • Held the index lead at flat price
  • trusted
Cons
  • Differentiation eroded by 4.7
Right for: stability-first adopters
Avoid if: you need the current capability frontier
Finance Lead8.5/10
Identical rate card to 4.7 — and because 4.6 keeps the old tokenizer, the same work can actually bill less here.

On headline rates Opus 4.6 is a wash with 4.7 at $5/$25, with the same cache and batch discounts and the same 1M-context flat pricing. The wrinkle that favors 4.6 is the tokenizer: Opus 4.7's new tokenizer can use up to 35% more tokens for identical text, so for workloads where 4.6's output quality is sufficient, staying can be cheaper per equivalent task for a quarter or two while planning migration. There is no cost penalty to remaining on 4.6 short-term, and a real (if modest) cost reason to do so.

Cost Efficiency 8Pricing Transparency 9Value per Dollar 8
Pros
  • Flat rates, old tokenizer can be cheaper, full discounts
Cons
  • Capability left on the table vs 4.7
Right for: cost-sensitive teams not yet needing 4.7
Avoid if: you need 4.7's coding lift now
Domain Practitioner8.5/10
Opus 4.6 was the first 'throw the whole repo at it' model and it still is — but new code I write on 4.7.

For builders, Opus 4.6 was a great model and still is, but 4.7 has materially better coding numbers at the same price, so new code goes to 4.7. Where 4.6 stays useful is integrations tuned tightly to its instruction-following quirks, which do not always port cleanly. Adaptive thinking and four effort levels work well, tool use is identical to 4.7, and the 1M context made this generation the first practical whole-repo model. The honest framing: 4.6 is the previous best, 4.7 is the current best.

API Ergonomics 9Tool/Agent Support 9Reliability 9
Pros
  • Stable tooling, whole-repo context, four effort levels
Cons
  • Behind 4.7 on agentic coding
Right for: maintaining tuned 4.6 integrations
Avoid if: building fresh agents (use 4.7)
Power User8.5/10
On casual chat most users can't tell 4.6 from 4.7 — and some prefer 4.6's slightly warmer voice.

For a consumer chat product, Opus 4.6 still delivers a polished experience: high conversation quality, calibrated refusals, working vision, and moderate latency with a lower time-to-first-token than 4.7's max-effort mode. The voice has a slight Anthropic warmth that some users prefer to 4.7's more direct, literal style. The May 2025 reliable knowledge cutoff is starting to feel dated for current events, but web search fills the gap. Most end users will not perceive a difference between 4.6 and 4.7 in casual use.

Output Quality 8.5Speed 7Everyday Usefulness 8.5
Pros
  • Polished, warmer tone, lower TTFT than 4.7
Cons
  • Dated cutoff
  • superseded
Right for: chat surfaces valuing tone
Avoid if: you need the newest knowledge or coding edge
Skeptic8/10
A genuinely strong model now mostly notable for being cheaper-per-task than its own successor.

Opus 4.6's benchmarks are real and were briefly best-in-class, so there is no hype problem with the model itself. The skeptical point is positioning: it has been superseded across nearly every eval by 4.7 at the same list price, and its main live advantage is the accident that its older tokenizer bills fewer tokens than 4.7's. The ARC-AGI-2 jump to 68.8% was impressive but is a single benchmark, and like all Opus models the latency is poor for interactive use. As a data point it is honest; as a purchase decision for new work it loses cleanly to 4.7.

Claim Accuracy 8.5Weakness Severity 7Hype vs Reality 8
Pros
  • Numbers are real and well-sourced
Cons
  • Superseded on nearly everything
  • slow
Right for: skeptics optimizing token cost short-term
Avoid if: you would otherwise just use 4.7

Strengths

  • Frontier all-rounder: top scores across coding, science, math, vision, and agentic benchmarks.
  • 1M context at standard pricing with no premium.
  • ARC-AGI-2 68.8% is a meaningful novel-problem-reasoning signal.
  • Stable tokenizer and prompt behavior — existing prompt suites work without re-tuning.
  • Adaptive thinking and four effort levels give granular cost-quality trade-offs.

Limitations

  • Clearly behind Opus 4.7 on agentic coding (SWE-bench Pro 53.4% vs 64.3%, OSWorld 72.7% vs 78.0%).
  • Mid-pack on long browse-and-synthesize loops; superseded by 4.7 on most evals.
  • May 2025 reliable cutoff misses late-2025 events without web search.
  • Moderate latency; not ideal for sub-second chat.
  • Anthropic's official guidance is to migrate to Opus 4.7 for new builds.

Best use cases

- Production systems already integrated against Opus 4.6 where prompt and tokenizer stability beat the 4.7 capability lift. - Long-context document workflows where the 1M-token window is the differentiator. - Hard reasoning benchmarks (GPQA, AIME, ARC-AGI-2) where Opus-tier scores justify the cost. - Agent teams and Claude Code workflows tuned to Opus 4.6 tooling.

Buyer questions

Should I stay on 4.6 or move to 4.7?

For new builds, move to 4.7. For tuned production prompts, plan a controlled migration over a quarter — 4.6 stays fully supported meanwhile.

Is 4.6 cheaper than 4.7?

On rate card, identical; in practice 4.6's older tokenizer can bill up to ~35% fewer tokens for the same text.

Does it have the 1M context?

Yes, at standard pricing with no premium, plus context compaction for long agents.

Is it secure for enterprise?

Yes — no training on inputs, SOC 2 Type II, ISO 27001/42001, HIPAA BAA, GDPR, data-residency options.

Which clouds host it?

First-party Claude API plus Bedrock, Vertex AI, and Microsoft Foundry with regional endpoints.

What did 4.6 introduce?

1M context at the Opus tier, adaptive thinking with four effort levels, context compaction, and agent teams in Claude Code.

Comparable models

Claude Opus 4.7: Direct successor; same price, better on agentic coding and vision, but a new tokenizer that bills more per text.
Claude Sonnet 4.6: Same family; 60% cheaper input, ~1.2 pts behind on SWE-bench Verified, faster.
GPT-5.5 / Gemini 3.1 Pro: Competing flagships; GPT-5.5 leads the AA Index, trade-offs vary by workload.

Model specs

Input price
$5 / Mtok
Output price
$25 / Mtok
Cached input
$0.50 / Mtok
Batch (in/out)
$2.50 / $12.50
Context window
1M tokens
Max output
128K tokens
Knowledge cutoff
2025-05
Released
2026-02-04
Modalities
text, image → text
Output speed
~45.9 tok/s
License
Proprietary
Clouds
Bedrock, Vertex AI, Azure AI Foundry

Does not train on API inputs by default

Last verified 2026-05-27