by Anthropic · Claude 4 family · best for agentic coding at the frontier
Claude Opus 4.7 is Anthropic's most capable generally available model, released April 16, 2026 as the flagship of the Claude 4 family. Its headline is a step-change in agentic coding: SWE-bench Verified jumps to 87.6% (from 80.8% on Opus 4.6) and SWE-bench Pro to 64.3%, while it tops the LMArena coding leaderboard. For a buyer, the single sentence is this: if you are building long-horizon coding or computer-use agents and accuracy beats latency, Opus 4.7 is the default choice on Anthropic — at unchanged $5/$25 pricing. - Provider: Anthropic - Released: 2026-04-16 - Status: GA (current flagship) - Context window: 1,000,000 tokens - Max output: 128,000 tokens (300k on Batch API with the output-300k beta header) - Modalities: text, image (vision up to 2,576 px long edge, ~3.75 MP) - Knowledge cutoff: January 2026 (reliable and training cutoff both January 2026) - Headline price: $5 input / $25 output per 1M tokens
| Benchmark | Score | Source |
|---|---|---|
| Humanity's Last Exam | 54.7% | anthropic.com 2026-04-16T00:00:00.000Z |
| MMMU | 91.5% | vellum.ai 2026-04-16T00:00:00.000Z |
| LMArena Elo | 1503 | openlm.ai 2026-05-28T00:00:00.000Z |
| GPQA Diamond | 94.2% | vellum.ai 2026-04-16T00:00:00.000Z |
| Terminal-Bench | 69.4% | vellum.ai 2026-04-16T00:00:00.000Z |
| LMArena Coding Elo | 1554 | openlm.ai 2026-05-28T00:00:00.000Z |
| SWE-bench Verified | 87.6% | anthropic.com 2026-04-16T00:00:00.000Z |
| Artificial Analysis Index | 57 | artificialanalysis.ai 2026-05-28T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“Opus 4.7 is the safest frontier bet on Anthropic: flat price, real multi-cloud, and the agentic-coding lead that justifies the tier.”
For a one-to-two-year platform bet, Opus 4.7 is the lowest-risk Anthropic landing pad. Pricing held at $5/$25, it is GA across Anthropic, Bedrock, Vertex, and Foundry, and the coding lead is defensible against GPT-5.5 and Gemini 3.1. The structural risk is lock-in: prompts tuned to Opus 4.7's literal instruction-following and new tokenizer do not port one-to-one to other vendors, and the tokenizer change resets cost models. Roadmap confidence is high given Anthropic's cadence. For frontier coding agents this is the default; chat surfaces should tier down to Sonnet or Haiku.
“Anthropic owns the agentic-coding narrative, and Opus 4.7 is the proof point that keeps Cursor, Windsurf, and Copilot in the fold.”
In market terms, Opus 4.7 cements Anthropic's moat in code and agents — the segment with the stickiest enterprise spend and the clearest ROI story. Topping the LMArena coding board and SWE-bench while holding price is a positioning win that competitors must answer on capability, not discounting. The differentiation is narrowing on raw intelligence (GPT-5.5 leads the AA Index at 60 vs 57), so Anthropic's edge is increasingly "best at doing work," not "highest IQ." Market timing is strong: launched into a cycle where agentic coding is the dominant enterprise use case.
“Headline price is unchanged, but the new tokenizer is a stealth 35% line-item increase you must model before you commit.”
At $5/$25 the rate card matches Opus 4.6 and 4.5, and cache reads ($0.50) plus batch ($2.50/$12.50) cut stable-workload bills sharply. The trap is the tokenizer: identical text can bill up to 35% more tokens, so per-task TCO rises even though per-token price did not. The 1M context at flat pricing is a genuine win — no long-context premium. Fast Mode at 6x must be ring-fenced to UX-critical calls. Value-per-dollar is good for hard work, mediocre for routine work that Sonnet or Haiku would handle at a fraction of the cost.
“The SDK is unchanged, tool use just works, and the diffs are cleaner — Opus 4.7 closes real PRs, not toy issues.”
For a hands-on builder, Opus 4.7 is the smoothest agent target Anthropic ships. The bash, text-editor, and computer-use scaffolds from Opus 4.6 keep working; streaming, structured output, and prompt caching behave identically. SWE-bench Verified 87.6% and SWE-bench Pro 64.3% translate to noticeably fewer "almost right" diffs in real workflows, and the `xhigh` effort level and task budgets give finer control over thinking cost. The one sharp edge is the tokenizer: cost estimators and context budgeters built for 4.6 silently under-count. Docs are excellent and the Claude Agent SDK is first-class.
“Conversation quality is excellent and refusals are rare, but you feel every second of the thinking latency.”
For a heavy daily user, Opus 4.7 is the most capable conversational Claude — concise when asked, coherent across very long sessions, and far less prone to over-refusal than 2024-era Claude. Vision genuinely works on pasted screenshots and PDFs. The trade-off is speed: time-to-first-token in max-effort mode is long and output streams at ~55 t/s, so it feels deliberate, not snappy. For deep work that is fine; for rapid back-and-forth it is frustrating, and most users will keep Sonnet 4.6 for everyday chat and reach for Opus 4.7 on hard problems.
“A frontier coder that GPT-5.5 out-thinks on the index, slowed to a crawl, with a tokenizer that quietly raises your bill 35%.”
The agentic-coding numbers are real and independently corroborated, so the headline holds. But three claims deserve scrutiny. First, "most capable" is selective: GPT-5.5 xhigh leads the Artificial Analysis Index 60 to 57, and Gemini 3.1 Pro ties at 57 — Opus 4.7's lead is in coding and agents, not general intelligence. Second, the "unchanged pricing" framing hides a real ~35% effective cost increase from the new tokenizer. Third, BrowseComp regressed, and the absence of clean AIME/MMLU-Pro disclosures makes the reasoning story partly unverifiable. SWE-bench is also the most prompt-sensitive, scaffold-sensitive benchmark in the suite — gains there are partly engineering, not pure model. It is excellent at code; treat the "frontier everything" gloss with caution.
- Long-horizon coding agents (multi-file refactors, autonomous PR loops, end-to-end feature builds in Claude Code). - Computer-use and browser-control agents where each step is expensive and accuracy outweighs throughput. - Heavy-document workflows (legal review, financial analysis, scientific literature) where 1M context plus high-res vision are real differentiators. - Hard reasoning where extra thinking budget actually moves the answer (GPQA-Diamond-difficulty research synthesis).
Per-token price is unchanged at $5/$25, but identical text can tokenize up to ~35% larger. Re-run your cost model on representative prompts before committing volume.
For coding and agents, yes — the SWE-bench Pro and OSWorld gains are large. For chat or copy, the lift is marginal and 4.6 may cost less per task.
Use Batch API for non-interactive work (latency irrelevant, 50% off) or Fast Mode (6x price) for interactive needs; otherwise expect deliberate response times.
No training on API inputs by default; SOC 2 Type II, ISO 27001/42001, HIPAA BAA, and GDPR all covered. Zero-retention and US data residency are available.
First-party Claude API plus Bedrock, Vertex AI, and Microsoft Foundry, with regional endpoints for data-residency needs.
Served at standard per-token pricing with no long-context premium; caching and batch discounts apply across the full window.
Leads the Artificial Analysis Intelligence Index (60 vs 57) and many reasoning evals; Opus 4.7 counters on SWE-bench and agentic coding leadership and on LMArena coding Elo.
Ties Opus 4.7 on the AA Index (57) and competes on long-context and multimodal; generally behind on SWE-bench Pro and agentic coding.
Does not train on API inputs by default
Last verified 2026-05-27