by Anthropic · Claude 4 family · best for best value production workhorse
Claude Sonnet 4.6 is Anthropic's balanced workhorse, released February 17, 2026, and the best value in the lineup: SWE-bench Verified 79.6% sits within ~1.2 points of Opus 4.6 at 60% of the input cost, with a 1M-token context at standard pricing. For a buyer, the single sentence is this: route everything that is not frontier-hard here — it is the default Claude Code model and the cost anchor for any Anthropic-centric production stack. - Provider: Anthropic - Released: 2026-02-17 - Status: GA (current Sonnet tier) - Context window: 1,000,000 tokens - Max output: 64,000 tokens (300k on Batch API with the output-300k beta header) - Modalities: text, image - Knowledge cutoff: August 2025 reliable (training cutoff January 2026) - Headline price: $3 input / $15 output per 1M tokens
| Benchmark | Score | Source |
|---|---|---|
| MMLU | 89.1% | benchlm.ai 2026-02-17T00:00:00.000Z |
| MMMU | 83.6% | morphllm.com 2026-02-17T00:00:00.000Z |
| MATH-500 | 89% | benchlm.ai 2026-02-17T00:00:00.000Z |
| MMLU-Pro | 87.3% | benchlm.ai 2026-02-17T00:00:00.000Z |
| AIME 2025 | 94% | benchlm.ai 2026-02-17T00:00:00.000Z |
| HumanEval | 98% | nxcode.io 2026-02-17T00:00:00.000Z |
| LMArena Elo | 1460 | openlm.ai 2026-05-28T00:00:00.000Z |
| GPQA Diamond | 74.1% | morphllm.com 2026-02-17T00:00:00.000Z |
| LiveCodeBench | 79.7% | rootly.com 2026-02-17T00:00:00.000Z |
| LMArena Coding Elo | 1500 | openlm.ai 2026-05-28T00:00:00.000Z |
| SWE-bench Verified | 79.6% | anthropic.com 2026-02-17T00:00:00.000Z |
| Artificial Analysis Index | 51 | artificialanalysis.ai 2026-02-17T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“Sonnet 4.6 is the model I standardize on: frontier-enough for code and agents at a price I can run on every request.”
For most workloads this is the right default. It clears the capability bar for coding and agent work while keeping per-call cost low enough to put on every user request, and multi-cloud availability (Bedrock global/regional, Vertex three-tier, first-party API) makes failover real. The 1M context at flat pricing simplifies capacity planning. The strategic risk is the gap to Opus 4.7 once the hardest decile of jobs arrives — plan to route those upstream. Lock-in is mitigated by the multi-region story and stable tokenizer.
“Opus-class coding at Sonnet price is the wedge that makes Anthropic the safe default for the whole production stack.”
Sonnet 4.6's positioning is the strongest in the lineup commercially: it neutralizes the "Opus is too expensive for production" objection and keeps teams from defecting to cheaper rivals. Leading GDPval-AA and Terminal-Bench (edging Opus 4.6) while priced at $3/$15 is a differentiation story that compounds Anthropic's ecosystem gravity in Claude Code, Cursor, and Copilot. Market timing is excellent: it landed as agentic coding became the dominant enterprise use case, capturing the high-volume middle of the market.
“At $3/$15 with deep cache and batch discounts, Sonnet 4.6 is the cost anchor of any Anthropic budget.”
This is the sweet spot of the lineup. Cached input ($0.30) and batch ($1.50/$7.50) cut bills 50-90% on stable workloads, and the 1M context at flat pricing removes the long-context premium entirely. Anthropic's own worked example (~$37 per 10k support conversations on Haiku) scales to a competitive ~3x for Sonnet, still well inside GPT-class peer ranges. The one budget watch-item is extended-thinking and max-effort spend — Sonnet 4.6 can use ~3x the output tokens of 4.5 if left at max effort by default. Set effort deliberately.
“This is the model I reach for when building — close to Opus on routine code, fast, with a clean thinking-budget knob.”
For hands-on builders, Sonnet 4.6 is the everyday driver. Coding feels close to Opus on routine work, latency is meaningfully better, and the explicit extended-thinking budget tunes cost without rewriting prompts. The full Anthropic tool surface works cleanly with the SDK; structured output and prompt caching behave as expected, and cache hits at 10% of input make repeated agent loops genuinely cheap. The OSWorld jump matters in practice: computer-use agents that flaked on Sonnet 4.5 are now usable. Where it falls short is the hardest debugging and refactor cases, which still want Opus 4.7.
“Fast, sharp, and rarely refuses — Sonnet 4.6 is the everyday Claude that just keeps up with you.”
For a heavy daily user, Sonnet 4.6 is a strong default chat surface. Latency is in the fast bucket, conversation quality is high, refusals are well calibrated, and it is more concise than Opus when asked. It handles pasted screenshots well enough to act on them. Some users notice a more direct, less effusive personality versus Sonnet 4.5. The only soft spot is the hardest scientific or math questions, where Opus 4.7 materially outperforms — but for the 95% case, Sonnet 4.6 is faster and feels better day to day.
“Great value, but 'within 1.2 points of Opus' leans on SWE-bench while GPQA Diamond sits 17 points lower.”
The value story is real and the coding parity on SWE-bench is genuine — but the "near-Opus" framing is selective. GPQA Diamond at 74.1% is roughly 17 points below Opus 4.6's 91.3%, so on hard science Sonnet 4.6 is not close. The headline GPQA number is also methodology-dependent (74.1% no-tools vs ~89.9% with extended thinking in some reports), which makes cross-model comparison fragile. The ~3x output-token inflation in max-effort mode quietly erodes the cost advantage if teams leave effort high. And it still trails Opus 4.7 meaningfully on SWE-bench Pro. It is the best value Claude; just do not mistake it for an Opus substitute on reasoning.
- Default model for production coding agents, IDE assistants, and Claude Code installs where cost per token matters. - Computer-use and browser-control agents at scale: Opus-class OSWorld at lower input cost. - High-volume customer support, classification, and structured-output pipelines. - Multilingual and long-document content workflows where the 1M context and broad language coverage shine.
Cost and speed. You get ~90% of the coding capability at 60% of input price and noticeably faster latency; reserve Opus for frontier-hard jobs.
Yes — served at standard per-token pricing with no long-context premium; caching and batch apply across the full window.
Use the explicit extended-thinking budget or set effort deliberately; max-effort can use ~3x the output tokens.
Yes — no training on inputs, SOC 2 Type II, ISO 27001/42001, HIPAA BAA, GDPR, plus data-residency options.
First-party Claude API plus Bedrock, Vertex AI, and Microsoft Foundry with regional endpoints.
Now for new builds — 4.6 is the same price with 5x context and large OSWorld/ARC gains.
Comparable workhorse tier; OpenAI is generally cheaper on input, Anthropic stronger on agentic coding and computer use.
Cheaper at the low end with large context, but weaker on SWE-bench Verified and OSWorld.
Does not train on API inputs by default
Last verified 2026-05-27