DeepSeek V3.2

GA

by DeepSeek · DeepSeek V3 family · best for open-weights math/reasoning at GA stability

ReasoningCost-OptimizedOpen-Weights
8.2
AI Panel Score
Value 9.3/10

DeepSeek V3.2 is the family's last 128K-context generation before the V4 long-context jump, and the most polished open-weights model in the V3 line — a 671B-parameter Mixture-of-Experts model (37B active) that introduced DeepSeek Sparse Attention (DSA) and posts frontier-class competition-math scores. The V3.2-Exp preview shipped 2025-09-29; the GA line (including the math-tuned V3.2-Speciale) landed 2025-12-01. The single sentence a buyer needs: when you need GA stability rather than V4 preview risk, strong math/reasoning, and rock-bottom token cost inside a 128K window, V3.2 is still the production-default DeepSeek pick in mid-2026. - **Provider:** DeepSeek - **Released:** 2025-12-01 (GA); V3.2-Exp preview 2025-09-29 - **Status:** GA - **Context window:** 128,000 tokens (V3.2-Speciale: 163,840) - **Max output:** 64,000 tokens - **Modalities:** Text in / text out - **Knowledge cutoff:** 2025-07 - **Headline price:** $0.252 in / $0.378 out per 1M tokens

What's new

  • Introduces DeepSeek Sparse Attention (DSA) — a fine-grained sparse-attention mechanism built atop the V3 MLA latent-attention design that cuts training and long-context inference cost while preserving quality.
  • AIME 2025 climbs to 93.1% (from V3.1's ~88%), HMMT 2025 to 92.5%, and HLE to 30.6.
  • V3.2-Speciale, a high-compute reasoning/math variant (API-only at launch, weights later), achieved gold-medal-level results across IMO 2025, IOI 2025, and ICPC World Finals 2025.
  • API pricing cut to under 3 cents per 1M input tokens on cache hits at the Exp launch — roughly half of V3.1's effective rates.
  • Output ceiling raised to 64K (from V3.1's 8K), removing most long-form chunking pain.

Benchmarks

BenchmarkScoreSource
Humanity's Last Exam30.6%api-docs.deepseek.com 2025-12-01T00:00:00.000Z
MMLU-Pro85%openrouter.ai 2025-12-01T00:00:00.000Z
AIME 202593.1%api-docs.deepseek.com 2025-12-01T00:00:00.000Z
LiveCodeBench74.1%macaron.im 2026-04-24T00:00:00.000Z
SWE-bench Verified67.8%macaron.im 2026-04-24T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker8/10
When I need a stable, board-defensible DeepSeek today, it's V3.2 — GA, open weights, and proven in production.

V3.2 was the production-stable bet for cost-conscious teams from late 2025 through Q1 2026, and even with V4 in preview it remains the rational GA choice for anyone who prioritizes guaranteed stability over preview-tier risk. DSA made long-context inference economical on V3-class hardware, and the math/reasoning gains made it competitive with US frontier on a meaningful slice of work. The sovereignty calculus is unchanged — fine for non-regulated workloads, off the table for many enterprise buyers without a neutral inference partner — but the mature open-weights status and broad third-party hosting make an in-boundary deployment the most turnkey in the DeepSeek lineup.

Strategic Fit 8Vendor Risk 7Roadmap Confidence 8.5
Pros
  • GA stability
  • proven open weights
  • broad hosting
Cons
  • 128K ceiling
  • coding trails frontier
  • PRC API residency
Right for: Teams needing a stable, cheap, math-strong default
Avoid if: You need 1M context or top-tier coding
Domain Strategist8/10
V3.2 is the model that crystallized the DeepSeek price thesis — the sub-3-cent input cut put the whole industry on notice.

Strategically, V3.2 is the consolidation release: it took V3.1's hybrid design, added DSA efficiency and gold-medal math, and cut price hard enough that the under-3-cents-per-million input headline became a category event. Its positioning is "the stable, cheap, reasoning-strong open-weights default," and through early 2026 it largely owned that slot against Qwen and Llama. The competitive risk now is internal cannibalization — V4-Flash undercuts it on context and price — so its strategic role is shifting from frontier wedge to dependable GA backbone. The math/Speciale angle remains a genuine differentiator in STEM-heavy markets.

Competitive Positioning 8Differentiation 8Market Timing 7.5
Pros
  • Defined the price thesis
  • standout math
  • stable open weights
Cons
  • Out-positioned by V4 on context/price
  • aging window
Right for: STEM and reasoning-heavy open-weights bets
Avoid if: You need the latest context/cost frontier
Finance Lead9/10
Sub-3-cent cache-hit input was the move that made the unit economics undeniable — V3.2 sits ~10x below the US frontier.

V3.2's price drop crystallized the DeepSeek thesis going into 2026. At $0.252 input / $0.378 output, it sits roughly 10x below the US frontier for comparable answer quality on most non-coding tasks, and cache-hit input at $0.025/M is transformative for RAG-style repeated-context workloads. As a GA model the pricing is stable and predictable — no preview volatility — and the open-weights option caps provider-price exposure. The unhedged line item remains geopolitical, but the mature third-party hosting ecosystem makes a compliant deployment cheap to stand up. On dollars-per-quality for math and reasoning, it is among the best values available.

Cost Efficiency 9.2Pricing Transparency 9Value per Dollar 9.2
Pros
  • ~10x below frontier
  • transformative cache pricing
  • GA stability
  • open-weights price cap
Cons
  • Reasoning-token volume
  • geopolitical contingency
Right for: Cost-sensitive math/RAG programs
Avoid if: Compliance forces a certified premium vendor
Domain Practitioner8.5/10
The mature one in the lineup — tool calls, JSON mode, exposed traces all behave, and the 64K output finally fits long generations.

For a builder, V3.2 is the most settled DeepSeek model. Tool calling, JSON mode, and structured output all behave well; exposed reasoning content via the reasoner path is easy to debug. Open weights make self-hosted realistic at 671B/37B, and the OpenAI-compatible endpoint makes migration trivial. The 64K output ceiling (up from V3.1's 8K) removes most long-form chunking friction. Documentation is solid if still Chinese-first in spots. There is no batch API. For teams that standardized on V3.2 in Q4 2025, there is no urgent reason to jump to V4 preview unless context window or coding ceiling forces the issue.

API Ergonomics 8.5Tool/Agent Support 8.5Reliability 9
Pros
  • Mature, stable behavior
  • good tool use
  • bigger output
  • broad hosting
Cons
  • Chinese-first docs
  • no batch API
  • 128K window
Right for: Builders wanting a settled, cheap default
Avoid if: You need 1M context or first-party SDK depth
Power User7.5/10
On math and STEM it genuinely impresses; for everyday chat it's a competent, free, slightly-formal workhorse.

End users on a V3.2-backed product won't notice a gap versus free ChatGPT or Claude on everyday tasks, and on math and STEM the model is actually a strong choice — the reasoning mode solves hard problems that free Western tiers struggle with. Latency is competitive in non-thinking mode and adds a few seconds for reasoning queries. Refusal rate is lower than Western models on most topics, with PRC-aligned guardrails on a narrow set. Helpfulness is high; tone is competent if slightly more formal than Claude. As a free option via the DeepSeek UI, the everyday value is strong.

Output Quality 7.5Speed 7.5Everyday Usefulness 7.5
Pros
  • Strong on math/STEM
  • permissive
  • free in UI
Cons
  • Slightly formal tone
  • PRC guardrails
Right for: STEM-leaning everyday use
Avoid if: You want the sharpest creative voice
Skeptic7.5/10
The math scores are real and impressive — but 'Speciale gold medals' is a separate high-compute variant, not the model you call by default.

V3.2's reasoning and math gains are well-documented and the price is verifiable, so the core story holds. The honest caveats: the gold-medal headlines belong to V3.2-Speciale, a high-compute API-only variant, not the base V3.2 most users hit — conflating them overstates the default experience. Coding is a real weak spot (SWE-bench ~67.8), well behind the frontier. The 128K window is now a generation old. And the family-wide governance issues — PRC storage, trains-on-input, no opt-out — apply. None of this undermines V3.2 as an excellent-value, stable, math-strong model; it just means matching the variant and benchmark to the actual deployment.

Claim Accuracy 7.5Weakness Severity 6.5Hype vs Reality 8
Pros
  • Verifiable math gains
  • stable open weights
  • transparent reports
Cons
  • Speciale ≠ base V3.2
  • weak coding
  • aging window
  • trains-on-input
Right for: Buyers who match variant to workload
Avoid if: You read Speciale scores as the default model

Strengths

  • Frontier-class competition math and reasoning at open-weights (AIME 93.1; Speciale at gold-medal level).
  • DSA sparse attention delivers genuine long-context inference cost reductions on V3-class hardware.
  • Mature, GA-stable API with a strong tool-use story — the production-default of early-to-mid 2026.
  • MIT open weights with broad inference-provider support and community quantizations.
  • 64K output ceiling removes most long-form chunking friction.

Limitations

  • 128K context is a generation behind V4's 1M window.
  • Coding (SWE-bench ~67.8) trails frontier coding agents by 10+ points.
  • Superseded by V4-Pro/Flash on context, output, and SWE-bench.
  • Same China data-residency and trains-on-input exposure as the family.
  • Not natively multimodal; no real-time retrieval.

Best use cases

- **Math, competitive programming, and STEM-heavy reasoning** agents where the Speciale-line capability shines. - **Cost-sensitive RAG and document-analysis** pipelines that fit inside 128K, leveraging DSA economics. - **GA-stable production workloads** that cannot take on V4 preview risk. - **Self-hosted open-weights deployments** where V4's 1.6T footprint is unaffordable but 671B/37B is feasible.

Buyer questions

Why pick V3.2 over V4 today?

GA stability. V4 is preview with shifting rate limits; V3.2 is production-proven with steady limits and broad third-party hosting. Choose V3.2 when reliability outweighs the 1M context and higher SWE-bench of V4.

What is DSA and why does it matter?

DeepSeek Sparse Attention selects a sparse subset of key-value positions per query, cutting long-context training and inference cost on V3-class hardware without quality loss — it is why V3.2 stays cheap on long inputs.

Is the math claim real?

Yes — AIME 2025 93.1% is strong, and the Speciale variant reached gold-medal level on IMO/IOI/ICPC. Note Speciale is a separate high-compute variant.

Can I self-host it?

Yes — 671B/37B MoE under MIT, realistically an 8x H200-class node at FP8, with INT4/GGUF community quants for smaller rigs.

How cheap is long-context RAG?

Very — $0.252/M input with cache hits at $0.025/M, plus DSA efficiency, make repeated 128K passes economical.

Is it multimodal?

No — text-only. For document/OCR pair it with DeepSeek-VL2 or a dedicated VL model.

Comparable models

**GPT-5 mini** — broader ecosystem, native multimodal; ~3-5x more expensive, weaker open-weights story.
**Claude Sonnet 4.6** — stronger coding and tool ergonomics, US procurement story; materially more expensive and closed.
**Qwen 3 235B** — closest China-origin open-weights peer at GA stability; comparable cost, V3.2 leads on competition math.

Model specs

Input price
$0.25 / Mtok
Output price
$0.38 / Mtok
Cached input
$0.03 / Mtok
Batch (in/out)
Context window
128K tokens
Max output
64K tokens
Knowledge cutoff
2025-07
Released
2025-11-30
Modalities
text → text
Output speed
Not profiled
License
Open weights (MIT)
Clouds
First-party API

Last verified 2026-05-27