Why pick Sonnet 4.6 over Opus 4.7?

Cost and speed. You get ~90% of the coding capability at 60% of input price and noticeably faster latency; reserve Opus for frontier-hard jobs.

Is the 1M context really free?

Yes — served at standard per-token pricing with no long-context premium; caching and batch apply across the full window.

How do I control thinking cost?

Use the explicit extended-thinking budget or set effort deliberately; max-effort can use ~3x the output tokens.

Is it secure enough for enterprise?

Yes — no training on inputs, SOC 2 Type II, ISO 27001/42001, HIPAA BAA, GDPR, plus data-residency options.

Which clouds host it?

First-party Claude API plus Bedrock, Vertex AI, and Microsoft Foundry with regional endpoints.

When should I move off Sonnet 4.5?

Now for new builds — 4.6 is the same price with 5x context and large OSWorld/ARC gains.

Claude Sonnet 4.6 Review — Benchmarks, Pricing & AI Panel Verdict

Benchmark	Score	Source
MMLU	89.1%	benchlm.ai 2026-02-17T00:00:00.000Z
MMMU	83.6%	morphllm.com 2026-02-17T00:00:00.000Z
MATH-500	89%	benchlm.ai 2026-02-17T00:00:00.000Z
MMLU-Pro	87.3%	benchlm.ai 2026-02-17T00:00:00.000Z
AIME 2025	94%	benchlm.ai 2026-02-17T00:00:00.000Z
HumanEval	98%	nxcode.io 2026-02-17T00:00:00.000Z
LMArena Elo	1460	openlm.ai 2026-05-28T00:00:00.000Z
GPQA Diamond	74.1%	morphllm.com 2026-02-17T00:00:00.000Z
LiveCodeBench	79.7%	rootly.com 2026-02-17T00:00:00.000Z
LMArena Coding Elo	1500	openlm.ai 2026-05-28T00:00:00.000Z
SWE-bench Verified	79.6%	anthropic.com 2026-02-17T00:00:00.000Z
Artificial Analysis Index	51	artificialanalysis.ai 2026-02-17T00:00:00.000Z

Architecture

Anthropic discloses no parameter count, layer count, or attention mechanism, so those fields are null/unknown. Disclosed: a 1M-token context window served at standard pricing; 64k synchronous max output (300k via batch beta); and support for both an explicit extended-thinking control (with token budgets) and adaptive thinking. Sonnet 4.6 retains the prior Claude tokenizer (Opus 4.7's new tokenizer is the exception), so token budgets and cost models from Sonnet 4.5 carry over cleanly.

Capabilities

Coding (9.0): SWE-bench Verified 79.6%, LiveCodeBench 79.7%, HumanEval ~98%, LMArena coding Elo 1500 — close to Opus on routine work at far lower cost. Reasoning (8.3): GPQA Diamond 74.1% (well below Opus tier), ARC-AGI-2 58.3%, AA Index 51 (second only to Opus 4.6 at the time of release). Math (8.5): AIME 2025 ~94% with tools, MATH ~89%. Agentic/tool use (8.8): OSWorld 72.5% (tied with Opus 4.6), full first-party tool suite, leads Terminal-Bench in AA testing. Long-context (9.0): 1M tokens at standard pricing, strong retrieval across the window. Multilingual (9.0): broad coverage, Arena multilingual 91.3. Vision (8.5) and document/OCR (8.3): solid for charts, screenshots, and documents though below Opus 4.7's high-res pipeline. Instruction-following (8.8): improved and more consistent than Sonnet 4.5. Function-calling (9.0): robust structured output and parallel calls. Safety calibration (9.0): ASL-3, balanced refusals. Realtime-data (7.0): no native post-August-2025 knowledge, but web search and web fetch close the gap.

Benchmark analysis

Benchmark	Score	vs Predecessor	vs Top Competitor	Source
SWE-bench Verified	79.6%	+2.4 vs Sonnet 4.5 (77.2%)	within 1.2 pts of Opus 4.6 (80.8%)	Anthropic
GPQA Diamond	74.1%	improved	behind Opus 4.6 (91.3%)	Morph
AIME 2025	94.0%	improved	frontier (with tools)	BenchLM
MMLU-Pro	87.3%	improved	near-frontier	BenchLM
LiveCodeBench	79.7%	improved	competitive frontier coder	Rootly
HumanEval	98%	improved	near-saturated	NxCode
MMMU	83.6%	improved	frontier vision tier	Morph
OSWorld-Verified	72.5%	+11.1 vs Sonnet 4.5 (61.4%)	~tied Opus 4.6 (72.7%)	Morph
ARC-AGI-2	58.3%	+44.7 vs Sonnet 4.5 (13.6%)	behind Opus 4.6 (68.8%)	Morph
LMArena Elo	1460	+40 vs Sonnet 4.5 (1420)	#6 tier	OpenLM
LMArena Coding Elo	1500	+36 vs Sonnet 4.5 (1464)	top-tier	OpenLM
Artificial Analysis Index	51	+8 vs Sonnet 4.5 (43)	behind Opus 4.6 (53)	AA

(GPQA Diamond varies by methodology: the 74.1% no-tools figure is used here; some aggregators report ~89.9% with extended thinking. Aider Polyglot, BBH, Tau-bench, Terminal-Bench, MRCR, SimpleQA, HLE carry no clean published Sonnet-4.6 figure and are null.)

Speed & latency

Sonnet 4.6 sits firmly in the fast latency tier — Anthropic labels it "fast," with output around 75 tokens/sec and time-to-first-token under ~2 seconds in typical use. This is the reason it is the default Claude Code model: it feels responsive in interactive IDE loops and chat while still clearing the frontier-enough bar for coding. Extended thinking adds latency only when explicitly enabled with a budget, giving developers a clean speed/quality knob.

Pricing analysis

Surface	Cost	Notes
API input	$3 / 1M tok	Standard rate
API output	$15 / 1M tok	Standard rate
Cached input (read/hit)	$0.30 / 1M tok	0.1x base
Cache write (5m / 1h)	$3.75 / $6 per 1M tok	1.25x / 2x base
Batch (in/out)	$1.50 / $7.50 per 1M tok	50% off both
Web search tool	$10 / 1,000 searches	plus token costs
Direct UI	$20/mo Pro · $100/mo Max 5x · $200/mo Max 20x	claude.ai; also on Free plan
Free tier	claude.ai Free plan	daily message caps
Rate limits	Tiered (Tier 1–4 + Enterprise)	Priority Tier supported

Deployment & access

Proprietary, no open weights or self-hosting. Available first-party via the Claude API and Claude Platform on AWS, plus Amazon Bedrock (global and regional endpoints), Google Vertex AI (global, multi-region, regional), and Microsoft Foundry. Regional/multi-region endpoints carry a 10% premium; first-party US-only routing via inference_geo: "us" adds 1.1x. Data residency options include US and global.

Safety & privacy

Governed by Anthropic's RSP v3.0 and deployed under ASL-3 protections. No training on API inputs by default; opt-out and zero-retention available. Compliance: SOC 2 Type II, ISO 27001:2022, ISO/IEC 42001:2023, HIPAA (BAA available), GDPR. No forced content-moderation classifier on API output; refusal calibration is mature and slightly more direct than Sonnet 4.5.

Ecosystem & tooling

SDKs in Python, TypeScript, Java, Go, Ruby, and C#. First-class in the Claude Agent SDK and Claude Code (its default model), plus LangChain, LlamaIndex, Vercel AI SDK, and Pydantic AI. Selectable in Cursor, GitHub Copilot, Windsurf, Replit, and Sourcegraph. Popularity is dominant — it is the highest-volume model in most Anthropic-centric production stacks.

Claude Sonnet 4.6

What's new

Benchmarks

AI Panel Review

Strengths

Limitations

Best use cases

Deep dive

Architecture

Capabilities

Benchmark analysis

Speed & latency

Pricing analysis

Deployment & access

Safety & privacy

Ecosystem & tooling

Buyer questions

Comparable models

Sources

Model specs

Other Claude 4 versions