Claude Sonnet 4.6

GALatest Sonnet

by Anthropic · Claude 4 family · best for best value production workhorse

FrontierCodingMultimodalLong-ContextCost-Optimized
8.8
AI Panel Score
Value 9.0/10

Claude Sonnet 4.6 is Anthropic's balanced workhorse, released February 17, 2026, and the best value in the lineup: SWE-bench Verified 79.6% sits within ~1.2 points of Opus 4.6 at 60% of the input cost, with a 1M-token context at standard pricing. For a buyer, the single sentence is this: route everything that is not frontier-hard here — it is the default Claude Code model and the cost anchor for any Anthropic-centric production stack. - Provider: Anthropic - Released: 2026-02-17 - Status: GA (current Sonnet tier) - Context window: 1,000,000 tokens - Max output: 64,000 tokens (300k on Batch API with the output-300k beta header) - Modalities: text, image - Knowledge cutoff: August 2025 reliable (training cutoff January 2026) - Headline price: $3 input / $15 output per 1M tokens

What's new

  • Approaches Opus-level coding at Sonnet price: SWE-bench Verified 79.6% (within ~1.2 pts of Opus 4.6) at 60% of input cost.
  • 1M-token context window at standard pricing (the prior Sonnet 4.5 was 200k).
  • Both extended thinking (explicit budgets) and adaptive thinking supported.
  • Major computer-use improvement (OSWorld 61.4% to 72.5%, roughly tied with Opus 4.6) plus hardened prompt-injection resistance.
  • 70% developer preference over Sonnet 4.5 in Claude Code internal testing.
  • ARC-AGI-2 jumped 4.3x over Sonnet 4.5 (13.6% to 58.3%).
  • Leads all models in GDPval-AA and Terminal-Bench in Artificial Analysis testing, edging Opus 4.6.

Benchmarks

BenchmarkScoreSource
MMLU89.1%benchlm.ai 2026-02-17T00:00:00.000Z
MMMU83.6%morphllm.com 2026-02-17T00:00:00.000Z
MATH-50089%benchlm.ai 2026-02-17T00:00:00.000Z
MMLU-Pro87.3%benchlm.ai 2026-02-17T00:00:00.000Z
AIME 202594%benchlm.ai 2026-02-17T00:00:00.000Z
HumanEval98%nxcode.io 2026-02-17T00:00:00.000Z
LMArena Elo1460openlm.ai 2026-05-28T00:00:00.000Z
GPQA Diamond74.1%morphllm.com 2026-02-17T00:00:00.000Z
LiveCodeBench79.7%rootly.com 2026-02-17T00:00:00.000Z
LMArena Coding Elo1500openlm.ai 2026-05-28T00:00:00.000Z
SWE-bench Verified79.6%anthropic.com 2026-02-17T00:00:00.000Z
Artificial Analysis Index51artificialanalysis.ai 2026-02-17T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker9/10
Sonnet 4.6 is the model I standardize on: frontier-enough for code and agents at a price I can run on every request.

For most workloads this is the right default. It clears the capability bar for coding and agent work while keeping per-call cost low enough to put on every user request, and multi-cloud availability (Bedrock global/regional, Vertex three-tier, first-party API) makes failover real. The 1M context at flat pricing simplifies capacity planning. The strategic risk is the gap to Opus 4.7 once the hardest decile of jobs arrives — plan to route those upstream. Lock-in is mitigated by the multi-region story and stable tokenizer.

Strategic Fit 9Vendor Risk 7Roadmap Confidence 9
Pros
  • Best capability-per-dollar, multi-cloud, fast, stable
Cons
  • Hard-decile jobs still want Opus
  • lock-in
Right for: orgs wanting one default model
Avoid if: every request is frontier-hard reasoning
Domain Strategist9/10
Opus-class coding at Sonnet price is the wedge that makes Anthropic the safe default for the whole production stack.

Sonnet 4.6's positioning is the strongest in the lineup commercially: it neutralizes the "Opus is too expensive for production" objection and keeps teams from defecting to cheaper rivals. Leading GDPval-AA and Terminal-Bench (edging Opus 4.6) while priced at $3/$15 is a differentiation story that compounds Anthropic's ecosystem gravity in Claude Code, Cursor, and Copilot. Market timing is excellent: it landed as agentic coding became the dominant enterprise use case, capturing the high-volume middle of the market.

Competitive Positioning 9Differentiation 9Market Timing 9
Pros
  • Value narrative is unmatched
  • ecosystem pull
Cons
  • Reasoning ceiling below Opus
  • GPT-5 mini undercuts on input price
Right for: capturing high-volume production spend
Avoid if: you compete purely on raw intelligence
Finance Lead9/10
At $3/$15 with deep cache and batch discounts, Sonnet 4.6 is the cost anchor of any Anthropic budget.

This is the sweet spot of the lineup. Cached input ($0.30) and batch ($1.50/$7.50) cut bills 50-90% on stable workloads, and the 1M context at flat pricing removes the long-context premium entirely. Anthropic's own worked example (~$37 per 10k support conversations on Haiku) scales to a competitive ~3x for Sonnet, still well inside GPT-class peer ranges. The one budget watch-item is extended-thinking and max-effort spend — Sonnet 4.6 can use ~3x the output tokens of 4.5 if left at max effort by default. Set effort deliberately.

Cost Efficiency 9Pricing Transparency 9Value per Dollar 9
Pros
  • Low rate card, deep discounts, flat 1M pricing, predictable
Cons
  • Max-effort token creep
  • cheaper rivals exist on input
Right for: high-volume production budgets
Avoid if: you need the absolute floor (use Haiku)
Domain Practitioner9/10
This is the model I reach for when building — close to Opus on routine code, fast, with a clean thinking-budget knob.

For hands-on builders, Sonnet 4.6 is the everyday driver. Coding feels close to Opus on routine work, latency is meaningfully better, and the explicit extended-thinking budget tunes cost without rewriting prompts. The full Anthropic tool surface works cleanly with the SDK; structured output and prompt caching behave as expected, and cache hits at 10% of input make repeated agent loops genuinely cheap. The OSWorld jump matters in practice: computer-use agents that flaked on Sonnet 4.5 are now usable. Where it falls short is the hardest debugging and refactor cases, which still want Opus 4.7.

API Ergonomics 9Tool/Agent Support 9Reliability 9
Pros
  • Fast, cheap cache loops, strong tool use, clean cost knob
Cons
  • Hardest tasks need Opus
  • 64k output cap
Right for: most agent and IDE builds
Avoid if: you need 128k single-pass output or top-decile reasoning
Power User8.5/10
Fast, sharp, and rarely refuses — Sonnet 4.6 is the everyday Claude that just keeps up with you.

For a heavy daily user, Sonnet 4.6 is a strong default chat surface. Latency is in the fast bucket, conversation quality is high, refusals are well calibrated, and it is more concise than Opus when asked. It handles pasted screenshots well enough to act on them. Some users notice a more direct, less effusive personality versus Sonnet 4.5. The only soft spot is the hardest scientific or math questions, where Opus 4.7 materially outperforms — but for the 95% case, Sonnet 4.6 is faster and feels better day to day.

Output Quality 8.5Speed 9Everyday Usefulness 9
Pros
  • Fast, helpful, low refusal, good vision
Cons
  • Below Opus on hardest reasoning
  • drier tone
Right for: everyday power use and chat
Avoid if: you mostly tackle graduate-level reasoning
Skeptic8/10
Great value, but 'within 1.2 points of Opus' leans on SWE-bench while GPQA Diamond sits 17 points lower.

The value story is real and the coding parity on SWE-bench is genuine — but the "near-Opus" framing is selective. GPQA Diamond at 74.1% is roughly 17 points below Opus 4.6's 91.3%, so on hard science Sonnet 4.6 is not close. The headline GPQA number is also methodology-dependent (74.1% no-tools vs ~89.9% with extended thinking in some reports), which makes cross-model comparison fragile. The ~3x output-token inflation in max-effort mode quietly erodes the cost advantage if teams leave effort high. And it still trails Opus 4.7 meaningfully on SWE-bench Pro. It is the best value Claude; just do not mistake it for an Opus substitute on reasoning.

Claim Accuracy 8Weakness Severity 7Hype vs Reality 8
Pros
  • Value is verifiable
  • coding parity holds on SWE-bench
Cons
  • Reasoning gap to Opus is large
  • benchmark methodology varies
  • token creep
Right for: skeptics optimizing cost-to-capability
Avoid if: you believe the "near-Opus everywhere" gloss

Strengths

  • Coding within ~1-2 points of Opus 4.6 at 40-60% of the cost — the value sweet spot.
  • 1M context at standard pricing makes whole-repo and long-transcript jobs realistic.
  • Large computer-use and prompt-injection-resistance gains; OSWorld now matches Opus 4.6.
  • Default Claude Code model with 70% developer preference over Sonnet 4.5.
  • Fast latency with an explicit extended-thinking budget as a cost knob.

Limitations

  • GPQA Diamond 74.1% is well below Opus-tier scientific reasoning — wrong pick for graduate-physics-style problems.
  • Still behind Opus 4.7 on SWE-bench Pro and the hardest agentic benchmarks.
  • 64k native max output vs 128k on Opus for ultra-long single-pass synthesis.
  • Reliable knowledge cutoff August 2025 — partial coverage of late-2025 events without web search.
  • Max-effort mode uses ~3x more output tokens than Sonnet 4.5, which can surprise budgets.

Best use cases

- Default model for production coding agents, IDE assistants, and Claude Code installs where cost per token matters. - Computer-use and browser-control agents at scale: Opus-class OSWorld at lower input cost. - High-volume customer support, classification, and structured-output pipelines. - Multilingual and long-document content workflows where the 1M context and broad language coverage shine.

Buyer questions

Why pick Sonnet 4.6 over Opus 4.7?

Cost and speed. You get ~90% of the coding capability at 60% of input price and noticeably faster latency; reserve Opus for frontier-hard jobs.

Is the 1M context really free?

Yes — served at standard per-token pricing with no long-context premium; caching and batch apply across the full window.

How do I control thinking cost?

Use the explicit extended-thinking budget or set effort deliberately; max-effort can use ~3x the output tokens.

Is it secure enough for enterprise?

Yes — no training on inputs, SOC 2 Type II, ISO 27001/42001, HIPAA BAA, GDPR, plus data-residency options.

Which clouds host it?

First-party Claude API plus Bedrock, Vertex AI, and Microsoft Foundry with regional endpoints.

When should I move off Sonnet 4.5?

Now for new builds — 4.6 is the same price with 5x context and large OSWorld/ARC gains.

Comparable models

GPT-5 / GPT-5 miniOpenAI

Comparable workhorse tier; OpenAI is generally cheaper on input, Anthropic stronger on agentic coding and computer use.

Gemini 3 Flash / 3.1 FlashGoogle

Cheaper at the low end with large context, but weaker on SWE-bench Verified and OSWorld.

Claude Opus 4.7: Same family; ~8 pts better on SWE-bench Verified at ~1.7x input cost — route the hardest decile here.

Model specs

Input price
$3 / Mtok
Output price
$15 / Mtok
Cached input
$0.30 / Mtok
Batch (in/out)
$1.50 / $7.50
Context window
1M tokens
Max output
64K tokens
Knowledge cutoff
2025-08
Released
2026-02-16
Modalities
text, image → text
Output speed
~75 tok/s
License
Proprietary
Clouds
Bedrock, Vertex AI, Azure AI Foundry

Does not train on API inputs by default

Last verified 2026-05-27