Claude Sonnet 4.5

by Anthropic · Claude 4 family · best for stable legacy workhorse

CodingMultimodalCost-Optimized

7.5

AI Panel Score

Value 8.0/10

Claude Sonnet 4.5, released September 29, 2025, was Anthropic's workhorse from late September 2025 until Sonnet 4.6 superseded it in February 2026. It introduced state-of-the-art coding at its tier and long-horizon focus (30+ hours on task in Anthropic testing), and it remains fully supported at the same $3/$15 price as 4.6. For a buyer, the single sentence is this: a still-capable, well-hardened model whose only real gaps versus 4.6 are the 200k context (vs 1M) and weaker computer use — fine to keep in a controlled migration window, but new builds should target 4.6.

Compare this model All Claude 4 versions

What's new

vs Sonnet 4: state-of-the-art coding at release, with focus held across multi-step tasks for 30+ hours in Anthropic testing.
Claude Code checkpoint system for progress saving and rollback shipped here first.
Native VS Code extension at launch.
Context editing and memory tools for long-running agents.
File creation (spreadsheets, slides, documents) directly in claude.ai.
Claude Agent SDK released alongside the model.
Enhanced alignment work: reduced deception and power-seeking behaviors in evaluations.

Benchmarks

Benchmark	Score	Source
MMLU	89.1%	leanware.co 2025-09-29T00:00:00.000Z
AIME 2025	87%	leanware.co 2025-09-29T00:00:00.000Z
TAU-bench	86.2%	leanware.co 2025-09-29T00:00:00.000Z
LMArena Elo	1420	openlm.ai 2026-05-28T00:00:00.000Z
GPQA Diamond	83.4%	leanware.co 2025-09-29T00:00:00.000Z
LMArena Coding Elo	1464	openlm.ai 2026-05-28T00:00:00.000Z
SWE-bench Verified	77.2%	morphllm.com 2025-09-29T00:00:00.000Z
Artificial Analysis Index	43	artificialanalysis.ai 2025-09-29T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker7/10

“Sonnet 4.5 was the right pick for two quarters; now the only strategic question is migration timing to 4.6.”

From October 2025 through February 2026 this was the default workhorse. With Sonnet 4.6 now available at the same price with 1M context and large OSWorld/ARC gains, the strategic question is migration timing, not continued use. Anthropic keeps 4.5 fully supported, but new deployments should standardize on 4.6. The risk in staying is incrementally dated training data and the eventual deprecation cycle. For production traffic, plan a controlled rollout to 4.6 over a quarter and treat 4.5 as exit-only.

Strategic Fit 7Vendor Risk 6Roadmap Confidence 8

Pros

Mature, same price as 4.6, multi-cloud

Cons

200k cap, dated cutoff, superseded

Right for: controlled migration windows

Avoid if: starting new builds (use 4.6)

Domain Strategist7.5/10

“Sonnet 4.5's legacy is the agent tooling — checkpoints, the Agent SDK, VS Code — that still anchors the ecosystem.”

Sonnet 4.5's market significance is less about benchmarks than about the agent infrastructure it launched: the Claude Code checkpoint system, the native VS Code extension, context editing, memory tools, and the Claude Agent SDK all shipped with it. That tooling built the ecosystem moat that 4.6 and 4.7 now inherit. As a standalone product its differentiation has faded, but its strategic footprint as the model that operationalized Claude agents remains large.

Competitive Positioning 7Differentiation 7Market Timing 8

Pros

Launched durable agent tooling
trusted

Cons

Superseded by 4.6 on capability and context

Right for: legacy agent stacks

Avoid if: you need current capability

Finance Lead7.5/10

“Same $3/$15 as 4.6 with no cost downside to migrating — and a soft cost reason to move on document-heavy work.”

Pricing is identical to Sonnet 4.6 at $3/$15, so the cost case for staying or moving is a wash on rate card, and cache/batch discounts match. The one budget consideration is the 200k context, which forces chunking workflows that can use more total tokens than equivalent work on 4.6's 1M window. Net: no financial reason to delay migration, and a soft financial reason to accelerate it on document-heavy pipelines.

Cost Efficiency 8Pricing Transparency 9Value per Dollar 8

Pros

Same rates as 4.6, full discounts

Cons

200k cap can inflate token use via chunking

Right for: short-term cost-neutral operation

Avoid if: document-heavy work (4.6 is cheaper in practice)

Domain Practitioner7.5/10

“The checkpoint system and VS Code extension shipped here first — but for browser agents, 4.6's computer use wins.”

For builders, Sonnet 4.5 was the daily driver for Q4 2025 and into early 2026. Tool use is identical to later models, the Claude Code checkpoint system shipped here first, and the VS Code extension is mature. The honest gap is computer use and OSWorld — Sonnet 4.6 is meaningfully better, so for any browser-control or agentic workflow, 4.6 is the right move. Pure coding is close enough between 4.5 and 4.6 that you can stay on 4.5 if your prompt suite depends on its behavior; otherwise migrate.

API Ergonomics 9Tool/Agent Support 8Reliability 8.5

Pros

Mature tooling, identical API, strong coding

Cons

Weaker computer use
200k cap

Right for: tuned legacy agent loops

Avoid if: building browser/computer-use agents

Power User8/10

“Fast and high-quality — on casual chat most users won't notice the gap to 4.6 at all.”

For a consumer chat product, Sonnet 4.5 delivered a strong experience: fast latency, high conversation quality, calibrated refusals. End users will not perceive a meaningful gap between 4.5 and 4.6 on casual chat. Where 4.6 wins on user-visible quality is hard reasoning, computer use, and very long sessions where context limits matter. Safety calibration is mature. The January 2025 reliable cutoff is starting to feel dated for current events — web search mitigates but does not eliminate it.

Output Quality 8Speed 8.5Everyday Usefulness 8

Pros

Fast, helpful, calibrated, mature

Cons

Dated cutoff
below 4.6 on hard tasks

Right for: everyday chat surfaces

Avoid if: you need newest knowledge or long sessions

Skeptic7.5/10

“A fine 2025 model whose ARC-AGI-2 of 13.6% shows just how far the goalposts moved in one release.”

Sonnet 4.5 was genuinely strong at release and the coding numbers held up, so there is no deception in the original claims. The skeptical point is generational: ARC-AGI-2 at 13.6% versus 58.3% on Sonnet 4.6 shows how quickly the frontier moved, and the 200k context plus January 2025 cutoff now look dated next to 4.6 at identical price. The AIME "100% with tools" framing also flatters the model — the no-tools 87% is the honest number. There is no cost or capability case for new work here; it is a maintenance model.

Claim Accuracy 8Weakness Severity 7Hype vs Reality 7.5

Pros

Original claims were honest
mature

Cons

Superseded at same price
dated cutoff
weak novel reasoning

Right for: skeptics maintaining legacy stacks

Avoid if: you would otherwise use 4.6

Strengths

Long-horizon focus: held coherent multi-step work for 30+ hours in testing.
Strong coding for its tier — widely deployed in Claude Code through Q4 2025.
Reliable agent behavior with context editing and memory tools.
Same $3/$15 pricing as Sonnet 4.6, so cost is not a migration driver.
Mature and well-hardened by months of production use; notable alignment gains.

Limitations

200k context only (Sonnet 4.6 jumped to 1M) — the main reason to migrate.
ARC-AGI-2 13.6% reflects the pre-jump generation; novel-puzzle reasoning is weak.
January 2025 reliable knowledge cutoff is now meaningfully dated.
Computer use less hardened than Sonnet 4.6 (OSWorld 61.4% vs 72.5%).
Anthropic recommends migrating to Sonnet 4.6 for new builds.

Best use cases

Production systems running stable Claude Code workflows not yet re-validated on Sonnet 4.6.
Agent loops tuned to Sonnet 4.5's specific instruction-following behavior.
Cost-conscious workloads where 200k context is sufficient and prompt stability is valued.

Deep dive

The full research notes behind this review — verified against primary sources.

Architecture Capabilities Benchmark analysis Speed & latency Pricing analysis Deployment & access Safety & privacy Ecosystem & tooling

Architecture

Anthropic discloses no parameter count, layer count, or attention mechanism — null/unknown. Disclosed: a 200k-token context window (the 1M context arrived with Sonnet 4.6), 64k max output, and extended thinking with token budgets. It uses the standard pre-4.7 Claude tokenizer. Sonnet 4.5 was the launch vehicle for the Claude Agent SDK and the Claude Code checkpoint system, so a great deal of agent tooling was first hardened on this model.

Capabilities

Coding (8.5): SWE-bench Verified 77.2%, LMArena coding Elo 1464 — SOTA for its tier at release and still widely deployed. Reasoning (8.0): GPQA Diamond 83.4%, MMLU 89.1%, AA Index 43 (reasoning). Math (8.3): AIME 2025 87% without tools (100% with Python tools). Agentic/tool use (8.0): full first-party tool suite, OSWorld 61.4%, Tau-bench retail 86.2%, plus context editing and memory tools — but computer use is less hardened than 4.6. Long-context (7.5): 200k window only, the main constraint. Multilingual (8.5): broad coverage. Vision (8.3) and document/OCR (8.0): solid for charts and documents. Instruction-following (8.5): strong, one notch below 4.6. Function-calling (8.8): robust. Safety calibration (9.0): ASL-3 with notable alignment improvements (less deception/power-seeking). Realtime-data (6.5): January 2025 cutoff — now meaningfully dated — plus web search/fetch.

Benchmark analysis

Benchmark	Score	vs Predecessor	vs Successor	Source
SWE-bench Verified	77.2%	improved vs Sonnet 4	behind Sonnet 4.6 (79.6%)	Morph
GPQA Diamond	83.4%	improved	comparable to 4.6 (74.1% no-tools)	Leanware
MMLU	89.1%	improved	frontier-adjacent	Leanware
AIME 2025	87.0%	improved	frontier (100% with tools)	Leanware
Tau-bench Retail	86.2%	improved	strong tool use	Leanware
OSWorld	61.4%	+19.2 in four months	behind Sonnet 4.6 (72.5%)	Morph
ARC-AGI-2	13.6%	n/a	far behind Sonnet 4.6 (58.3%)	Morph
LMArena Elo	1420	improved	behind Sonnet 4.6 (1460)	OpenLM
LMArena Coding Elo	1464	improved	behind Sonnet 4.6 (1500)	OpenLM
Artificial Analysis Index	43	improved	behind Sonnet 4.6 (51)	AA

(MMLU-Pro, MATH-500, HumanEval, LiveCodeBench, Aider Polyglot, MMMU, IFEval, BBH, Terminal-Bench, MRCR, SimpleQA, HLE carry no clean published Sonnet-4.5 figure and are null. GPQA Diamond methodology varies; the 83.4% figure reflects extended-thinking conditions.)

Speed & latency

Anthropic labels Sonnet 4.5 latency "fast" — output around 60 tokens/sec with time-to-first-token near ~1.4s in typical use. It was responsive enough to be the default Claude Code model through Q4 2025 and into early 2026, feeling snappy in IDE loops and chat. Extended thinking adds latency only when explicitly enabled with a budget.

Pricing analysis

Surface	Cost	Notes
API input	$3 / 1M tok	Identical to Sonnet 4.6
API output	$15 / 1M tok	Identical
Cached input (read/hit)	$0.30 / 1M tok	0.1x base
Cache write (5m / 1h)	$3.75 / $6 per 1M tok	1.25x / 2x base
Batch (in/out)	$1.50 / $7.50 per 1M tok	50% off both
Web search tool	$10 / 1,000 searches	plus token costs
Direct UI	$20/mo Pro · $100/mo Max 5x · $200/mo Max 20x	claude.ai; also on Free plan
Free tier	claude.ai Free plan	daily message caps
Rate limits	Tiered (Tier 1–4 + Enterprise)	Priority Tier supported

Deployment & access

Proprietary, no open weights or self-hosting. First-party via the Claude API and Claude Platform on AWS, plus Amazon Bedrock (global and regional endpoints — Sonnet 4.5 was the first model to offer Bedrock global vs regional endpoints), Google Vertex AI (global, multi-region, regional), and Microsoft Foundry. Regional/multi-region endpoints carry a 10% premium. Data residency options include US and global. Note: Sonnet 4.5 does not support the inference_geo parameter (introduced with the 4.6 generation).

Safety & privacy

Governed by Anthropic's RSP v3.0 and deployed under ASL-3 protections. No training on API inputs by default; opt-out and zero-retention available. Compliance: SOC 2 Type II, ISO 27001:2022, ISO/IEC 42001:2023, HIPAA (BAA available), GDPR. Anthropic highlighted enhanced alignment at release — reduced deception and power-seeking in evaluations. No forced content-moderation classifier.

Ecosystem & tooling

SDKs in Python, TypeScript, Java, Go, Ruby, and C#. Launch vehicle for the Claude Agent SDK and Claude Code checkpoint system; works with LangChain, LlamaIndex, Vercel AI SDK, and Pydantic AI; selectable in Cursor, GitHub Copilot, and Windsurf. Popularity is mainstream, with a large installed base still in production migration to 4.6.

Buyer questions

Should I still use Sonnet 4.5?

For existing tuned stacks, short-term yes; for new builds, use Sonnet 4.6 at the same price.

What is the biggest gap to 4.6?

Context (200k vs 1M) and computer use (OSWorld 61.4% vs 72.5%).

Is migration costly?

No — same rate card; mostly prompt re-validation, and 4.6's larger context can reduce chunking overhead.

Is it secure for enterprise?

Yes — no training on inputs, SOC 2 Type II, ISO 27001/42001, HIPAA BAA, GDPR.

Which clouds host it?

First-party Claude API plus Bedrock, Vertex AI, and Microsoft Foundry; it was the first model with Bedrock global/regional endpoints.

What did Sonnet 4.5 introduce?

The Claude Agent SDK, Claude Code checkpoints, the VS Code extension, and context/memory tools.

Comparable models

Claude Sonnet 4.6: Direct successor; same price, 5x context, better OSWorld and ARC-AGI-2 — the migration target.

Claude Opus 4.6 / 4.7: Higher-tier flagships; ~1.7x input cost, stronger on hard reasoning.

GPT-5 — OpenAI

Comparable workhorse from a competing provider; trade-offs vary by workload.

Sources

Primary references used to verify this review.

Model specs

Input price: $3 / Mtok
Output price: $15 / Mtok
Cached input: $0.30 / Mtok
Batch (in/out): $1.50 / $7.50
Context window: 200K tokens
Max output: 64K tokens
Knowledge cutoff: 2025-01
Released: 2025-09-28
Modalities: text, image → text
Output speed: ~60 tok/s
License: Proprietary
Clouds: Bedrock, Vertex AI, Azure AI Foundry

Does not train on API inputs by default

Other Claude 4 versions

Last verified 2026-05-27