DeepSeek V4-Pro

previewLatest Pro

by DeepSeek · DeepSeek V4 family · best for frontier-grade agentic coding at open-weights cost

FrontierReasoningCodingCost-OptimizedOpen-WeightsLong-Context
8.5
AI Panel Score
Value 9.7/10

DeepSeek V4-Pro is the flagship of DeepSeek's April 2026 V4 family — a 1.6-trillion-parameter Mixture-of-Experts model that activates only 49B parameters per token, pairs a 1M-token context with a 384K output ceiling, and lands frontier-class coding and reasoning scores at roughly a tenth of US-frontier token prices. It shipped as a preview on 2026-04-24, with open weights on Hugging Face under an MIT license. The single sentence a buyer needs: if your workload is heavy coding or reasoning and your threat model tolerates a Chinese-origin provider (or you can self-host the weights), V4-Pro is the most cost-effective frontier-grade model available today. - **Provider:** DeepSeek (Hangzhou DeepSeek Artificial Intelligence Co.) - **Released:** 2026-04-24 (preview) - **Status:** Preview (no GA announced) - **Context window:** 1,000,000 tokens - **Max output:** 384,000 tokens - **Modalities:** Text in / text out - **Knowledge cutoff:** 2026-02 - **Headline price:** $0.435 in / $0.87 out per 1M tokens (75% cut now permanent)

What's new

  • New hybrid attention stack — Compressed Sparse Attention (CSA) plus a Heavily Compressed Attention (HCA) head — brings 1M-token context to a fraction of dense-attention prefill cost; DeepSeek's own announcement frames it as "token-wise compression + DSA."
  • Total parameters scaled to 1.6T (49B active per token), up from V3's 671B/37B, on more than 32T training tokens.
  • Native 384K output ceiling (vs V3.2's 64K) makes single-call long-form reports and full agent transcripts viable.
  • Unified 1M context across both Thinking and Non-Thinking modes, with selectable High/Max reasoning effort.
  • On 2026-05-22 DeepSeek made the launch 75% discount permanent: list price is now $0.435/$0.87, down from the original $1.74/$3.48.

Benchmarks

BenchmarkScoreSource
Humanity's Last Exam37.7%huggingface.co 2026-04-24T00:00:00.000Z
MMLU-Pro87.5%huggingface.co 2026-04-24T00:00:00.000Z
SimpleQA57.9%huggingface.co 2026-04-24T00:00:00.000Z
HumanEval76.8%huggingface.co 2026-04-24T00:00:00.000Z
GPQA Diamond90.1%huggingface.co 2026-04-24T00:00:00.000Z
LiveCodeBench93.5%huggingface.co 2026-04-24T00:00:00.000Z
MRCR Long Context83.5%huggingface.co 2026-04-24T00:00:00.000Z
LMArena Coding Elo1287artificialanalysis.ai 2026-05-15T00:00:00.000Z
SWE-bench Verified80.6%huggingface.co 2026-04-24T00:00:00.000Z
Artificial Analysis Index52artificialanalysis.ai 2026-05-15T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker8/10
Frontier coding at a tenth of the price rewrites my build-vs-buy math — the only real question is sovereignty, not capability.

V4-Pro is the most disruptive procurement event of 2026 so far. SWE-bench in the low 80s at roughly an order of magnitude below Claude Opus or GPT-5.x reorders the economics of any heavy reasoning or coding program. The binding constraint is not the model, it is vendor risk: routing US enterprise traffic through a PRC-hosted API is a non-starter for many buyers, and the trains-on-input default compounds that. The mitigating fact is the MIT open weights, which let a buyer keep inference entirely in-boundary — at the cost of a serious multi-node GPU footprint. Preview status adds roadmap uncertainty. For non-regulated teams or those with self-host capacity, this is the rational frontier pick on dollars-per-quality-point.

Strategic Fit 8.5Vendor Risk 6Roadmap Confidence 7
Pros
  • Frontier capability at unmatched price
  • open-weights optionality
  • no lock-in
Cons
  • PRC residency exposure
  • preview status
  • trains-on-input default
Right for: Non-regulated teams or self-hosters chasing frontier value
Avoid if: You need a US/EU-certified managed vendor with contractual data guarantees
Domain Strategist8.5/10
DeepSeek owns the open-weights frontier — V4-Pro is the model every Western lab now has to answer on price-per-point.

Positioned as the open-weights SOTA, V4-Pro is the spearhead of DeepSeek's "frontier quality at commodity price" strategy. Its moat is not a single benchmark but the combination: MoE efficiency, a permissive MIT license, and pricing that the closed US frontier structurally cannot match without margin destruction. That forces a wedge into the market — buyers who would never have considered a Chinese model now evaluate it because the value gap is too large to ignore. The competitive risk is reputational and geopolitical rather than technical; export-control or procurement-policy shifts could reset the board overnight. Against Qwen, GLM, and Kimi, V4-Pro currently leads the open-weights intelligence rankings.

Competitive Positioning 9Differentiation 8.5Market Timing 8.5
Pros
  • Clear open-weights leadership
  • structural price advantage
  • strong agentic Elo
Cons
  • Geopolitical fragility
  • provisional benchmark transparency
Right for: Teams betting on the open-weights ecosystem
Avoid if: Your buyers equate Chinese-origin with disqualifying risk
Finance Lead9.5/10
This is an order-of-magnitude cost wedge — V4-Pro delivers frontier output at roughly a tenth the token price, and cache hits drop input to a rounding error.

This is where V4-Pro embarrasses the US frontier. At $0.435/M input and $0.87/M output — and with the 75% cut now permanent rather than promotional — V4-Pro lands roughly 10-20x cheaper per token than Claude Opus 4.5 or GPT-5.x for comparable benchmark output. Cache-hit input at $0.003625/M makes repeated-context agent loops essentially free on the input side. Bills are predictable and the open-weights option caps exposure to provider price hikes entirely. The one non-unit-economics line item is geopolitical: budget a contingency for a forced provider switch if export-control or procurement rules change. On pure intelligence-per-dollar, nothing at this capability tier comes close.

Cost Efficiency 9.8Pricing Transparency 9Value per Dollar 9.8
Pros
  • Order-of-magnitude cost advantage
  • permanent (not promo) pricing
  • aggressive cache discount
  • open-weights price cap
Cons
  • Reasoning-token volume inflates output bills in Max mode
  • geopolitical contingency risk
Right for: Any cost-sensitive program at frontier capability
Avoid if: Compliance forces a premium-priced certified vendor
Domain Practitioner8.5/10
OpenAI-compatible endpoint, exposed reasoning traces, and a 384K output budget — migration is hours, and long code-gen finally finishes in one call.

For a hands-on builder, V4-Pro is pleasant. The API is OpenAI-compatible, so swapping in is a base-URL and key change. The Thinking/Non-Thinking toggle plus High/Max effort tiers keep routing simple, and exposed reasoning content is invaluable for debugging agent loops. Tool calling, JSON mode, and structured output all work; the 384K output ceiling finally lets long-form code generation complete without chunking. The rough edges: SDK ergonomics and English-language docs trail OpenAI/Anthropic, error messages can be terse, and preview rate limits are tighter than steady-state. Open weights make local experimentation cheap if you have the hardware. No batch API yet.

API Ergonomics 8.5Tool/Agent Support 8Reliability 8
Pros
  • Drop-in OpenAI compatibility
  • exposed CoT
  • huge output budget
  • open weights
Cons
  • Thinner SDK/docs
  • terse errors
  • preview rate limits
  • no batch API
Right for: Builders already on the OpenAI SDK who want frontier coding cheap
Avoid if: You need mature first-party SDKs and managed-cloud SLAs
Power User7.5/10
On hard problems it matches the big closed models; the wait in Max mode and the PRC content guardrails are the only tells.

For a heavy daily user, V4-Pro is essentially indistinguishable from GPT-5.x or Claude on the bulk of real tasks, and noticeably better than free tiers on long-document and coding work. Output quality is high; the model is more willing to engage with technical content than Claude and refuses less on everyday topics. The trade-offs are latency in Max-effort thinking mode (several seconds to longer on hard queries) and PRC-aligned guardrails that return aligned responses on a narrow set of politically sensitive subjects, which users outside China may find limiting. Tone is competent but a shade more neutral and formal than Claude's. As a free option via chat.deepseek.com, the everyday value is hard to beat.

Output Quality 8.5Speed 7Everyday Usefulness 8
Pros
  • Frontier-level answers
  • permissive on technical topics
  • free in the UI
Cons
  • Max-mode latency
  • PRC content guardrails
  • slightly formal tone
Right for: Power users who want frontier quality without a subscription
Avoid if: You need the fastest responses or unrestricted political discussion
Skeptic7.5/10
The price is real and historic — but 'frontier' rests on self-reported, single-mode Max scores that nobody has independently reproduced yet.

The cost story is genuine and the open weights are verifiable, so the headline is not hype. The caveats are specific. First, the marquee numbers (SWE-bench 80.6, LiveCodeBench 93.5, GPQA 90.1) are Pro-Max — the highest-effort reasoning configuration — and non-thinking-mode performance is materially lower; comparisons should match modes. Second, DeepSeek's AIME claims are flagged "internal only" by aggregators pending reproduction, so treat unreproduced figures as provisional. Third, the hosted API trains on input by default and stores data in the PRC, which is a real, not theoretical, governance exposure. Fourth, "1M context" benchmarks well on MRCR but real-world degradation at extreme lengths is under-tested. The model is excellent value; the asterisks are about mode-matching, reproduction, and governance — not capability.

Claim Accuracy 7.5Weakness Severity 6.5Hype vs Reality 8
Pros
  • Verifiable open weights
  • reproducible price advantage
  • transparent model card
Cons
  • Marquee scores are single-mode and partly unreproduced
  • trains-on-input
  • preview volatility
Right for: Buyers who verify with their own evals before committing
Avoid if: You take vendor benchmark tables at face value

Strengths

  • Frontier-grade coding and reasoning (SWE-bench 80.6, LiveCodeBench 93.5, GPQA 90.1) at roughly a tenth of US-frontier token cost.
  • 1M-token context with sparse attention that holds up on long-haystack retrieval (MRCR-1M 83.5).
  • 384K output ceiling removes the chunking friction that plagued earlier DeepSeek coding agents.
  • MIT-licensed open weights — self-hostable, no vendor lock-in, residency escape hatch.
  • Cache-hit input at $0.003625/M makes retrieval-heavy agent loops nearly free on the input side.

Limitations

  • Preview status: API behavior, rate limits, and endpoints may shift before GA, and no GA date is announced.
  • Text-only: no vision, no audio, no native web retrieval.
  • Hosted API carries China data-residency exposure and trains-on-input default; many regulated buyers cannot use it directly.
  • Self-hosting the 1.6T MoE is a multi-node, capital-intensive deployment.
  • DeepSeek-published AIME numbers are not yet third-party reproduced; some headline claims should be treated as provisional.
  • SDK/tooling ecosystem still trails OpenAI and Anthropic in polish.

Best use cases

- **Long-context coding agents** working across entire repositories where 80%+ SWE-bench at a tenth of the cost reorders build-vs-buy math. - **High-volume reasoning pipelines** where per-token economics dominate ROI and the workload is not residency-restricted. - **Self-hosted frontier deployments** for teams with GPU capacity that want frontier quality with zero vendor lock-in. - **Research and document analysis** over 500K-1M token windows.

Buyer questions

How much cheaper is it, really?

Roughly 10-20x cheaper per token than US frontier flagships for comparable benchmark output, and the 75% discount is now permanent list pricing, not a promo.

Can I avoid sending data to China?

Yes — the weights are MIT-licensed and self-hostable on your own AWS/Azure/GCP/on-prem GPUs, which keeps all data in your boundary. The first-party API, by contrast, stores data on PRC servers.

What hardware do I need to self-host?

A multi-node GPU cluster — the 1.6T MoE at FP4/FP8-mixed needs roughly 900GB+ of VRAM (8x H200 floor, realistically 16+ GPUs). Most teams use a neutral inference provider instead.

Is it really frontier, or just cheap?

Its top-mode coding and reasoning scores are frontier-class, but they are the highest-effort (Max) configuration and some math claims are not yet third-party reproduced. Run your own evals in the mode you will deploy.

Does it do images?

No. V4-Pro is text-only despite some secondary coverage calling V4 multimodal.

Is it production-stable?

It is preview, not GA. For guaranteed stability today, V3.2 is the GA fallback in the family.

Comparable models

**Claude Opus 4.5/4.6** — better tool ergonomics, US procurement story, and contractual data guarantees; roughly 10-20x more expensive per token and closed-weights.
**GPT-5.4** — broader ecosystem, native multimodality, and managed-cloud availability; comparable frontier coding, much higher unit cost.
**Qwen 3 Max / GLM-5.1** — closest open-weights China-origin peers; V4-Pro currently leads them on the Artificial Analysis open-weights intelligence ranking.

Model specs

Input price
$0.43 / Mtok
Output price
$0.87 / Mtok
Cached input
$0.00 / Mtok
Batch (in/out)
Context window
1M tokens
Max output
384K tokens
Knowledge cutoff
2026-02
Released
2026-04-23
Modalities
text → text
Output speed
Not profiled
License
Open weights (MIT)
Clouds
First-party API

Last verified 2026-05-27