by Anthropic · Claude 4 family · best for stable legacy workhorse
Claude Sonnet 4.5, released September 29, 2025, was Anthropic's workhorse from late September 2025 until Sonnet 4.6 superseded it in February 2026. It introduced state-of-the-art coding at its tier and long-horizon focus (30+ hours on task in Anthropic testing), and it remains fully supported at the same $3/$15 price as 4.6. For a buyer, the single sentence is this: a still-capable, well-hardened model whose only real gaps versus 4.6 are the 200k context (vs 1M) and weaker computer use — fine to keep in a controlled migration window, but new builds should target 4.6. - Provider: Anthropic - Released: 2025-09-29 - Status: GA (legacy — superseded by Sonnet 4.6, still actively supported) - Context window: 200,000 tokens - Max output: 64,000 tokens - Modalities: text, image - Knowledge cutoff: January 2025 reliable (training cutoff July 2025) - Headline price: $3 input / $15 output per 1M tokens
| Benchmark | Score | Source |
|---|---|---|
| MMLU | 89.1% | leanware.co 2025-09-29T00:00:00.000Z |
| AIME 2025 | 87% | leanware.co 2025-09-29T00:00:00.000Z |
| TAU-bench | 86.2% | leanware.co 2025-09-29T00:00:00.000Z |
| LMArena Elo | 1420 | openlm.ai 2026-05-28T00:00:00.000Z |
| GPQA Diamond | 83.4% | leanware.co 2025-09-29T00:00:00.000Z |
| LMArena Coding Elo | 1464 | openlm.ai 2026-05-28T00:00:00.000Z |
| SWE-bench Verified | 77.2% | morphllm.com 2025-09-29T00:00:00.000Z |
| Artificial Analysis Index | 43 | artificialanalysis.ai 2025-09-29T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“Sonnet 4.5 was the right pick for two quarters; now the only strategic question is migration timing to 4.6.”
From October 2025 through February 2026 this was the default workhorse. With Sonnet 4.6 now available at the same price with 1M context and large OSWorld/ARC gains, the strategic question is migration timing, not continued use. Anthropic keeps 4.5 fully supported, but new deployments should standardize on 4.6. The risk in staying is incrementally dated training data and the eventual deprecation cycle. For production traffic, plan a controlled rollout to 4.6 over a quarter and treat 4.5 as exit-only.
“Sonnet 4.5's legacy is the agent tooling — checkpoints, the Agent SDK, VS Code — that still anchors the ecosystem.”
Sonnet 4.5's market significance is less about benchmarks than about the agent infrastructure it launched: the Claude Code checkpoint system, the native VS Code extension, context editing, memory tools, and the Claude Agent SDK all shipped with it. That tooling built the ecosystem moat that 4.6 and 4.7 now inherit. As a standalone product its differentiation has faded, but its strategic footprint as the model that operationalized Claude agents remains large.
“Same $3/$15 as 4.6 with no cost downside to migrating — and a soft cost reason to move on document-heavy work.”
Pricing is identical to Sonnet 4.6 at $3/$15, so the cost case for staying or moving is a wash on rate card, and cache/batch discounts match. The one budget consideration is the 200k context, which forces chunking workflows that can use more total tokens than equivalent work on 4.6's 1M window. Net: no financial reason to delay migration, and a soft financial reason to accelerate it on document-heavy pipelines.
“The checkpoint system and VS Code extension shipped here first — but for browser agents, 4.6's computer use wins.”
For builders, Sonnet 4.5 was the daily driver for Q4 2025 and into early 2026. Tool use is identical to later models, the Claude Code checkpoint system shipped here first, and the VS Code extension is mature. The honest gap is computer use and OSWorld — Sonnet 4.6 is meaningfully better, so for any browser-control or agentic workflow, 4.6 is the right move. Pure coding is close enough between 4.5 and 4.6 that you can stay on 4.5 if your prompt suite depends on its behavior; otherwise migrate.
“Fast and high-quality — on casual chat most users won't notice the gap to 4.6 at all.”
For a consumer chat product, Sonnet 4.5 delivered a strong experience: fast latency, high conversation quality, calibrated refusals. End users will not perceive a meaningful gap between 4.5 and 4.6 on casual chat. Where 4.6 wins on user-visible quality is hard reasoning, computer use, and very long sessions where context limits matter. Safety calibration is mature. The January 2025 reliable cutoff is starting to feel dated for current events — web search mitigates but does not eliminate it.
“A fine 2025 model whose ARC-AGI-2 of 13.6% shows just how far the goalposts moved in one release.”
Sonnet 4.5 was genuinely strong at release and the coding numbers held up, so there is no deception in the original claims. The skeptical point is generational: ARC-AGI-2 at 13.6% versus 58.3% on Sonnet 4.6 shows how quickly the frontier moved, and the 200k context plus January 2025 cutoff now look dated next to 4.6 at identical price. The AIME "100% with tools" framing also flatters the model — the no-tools 87% is the honest number. There is no cost or capability case for new work here; it is a maintenance model.
- Production systems running stable Claude Code workflows not yet re-validated on Sonnet 4.6. - Agent loops tuned to Sonnet 4.5's specific instruction-following behavior. - Cost-conscious workloads where 200k context is sufficient and prompt stability is valued.
For existing tuned stacks, short-term yes; for new builds, use Sonnet 4.6 at the same price.
Context (200k vs 1M) and computer use (OSWorld 61.4% vs 72.5%).
No — same rate card; mostly prompt re-validation, and 4.6's larger context can reduce chunking overhead.
Yes — no training on inputs, SOC 2 Type II, ISO 27001/42001, HIPAA BAA, GDPR.
First-party Claude API plus Bedrock, Vertex AI, and Microsoft Foundry; it was the first model with Bedrock global/regional endpoints.
The Claude Agent SDK, Claude Code checkpoints, the VS Code extension, and context/memory tools.
Comparable workhorse from a competing provider; trade-offs vary by workload.
Does not train on API inputs by default
Last verified 2026-05-27