by DeepSeek · DeepSeek V3 family · best for open-weights math/reasoning at GA stability
DeepSeek V3.2 is the family's last 128K-context generation before the V4 long-context jump, and the most polished open-weights model in the V3 line — a 671B-parameter Mixture-of-Experts model (37B active) that introduced DeepSeek Sparse Attention (DSA) and posts frontier-class competition-math scores. The V3.2-Exp preview shipped 2025-09-29; the GA line (including the math-tuned V3.2-Speciale) landed 2025-12-01. The single sentence a buyer needs: when you need GA stability rather than V4 preview risk, strong math/reasoning, and rock-bottom token cost inside a 128K window, V3.2 is still the production-default DeepSeek pick in mid-2026. - **Provider:** DeepSeek - **Released:** 2025-12-01 (GA); V3.2-Exp preview 2025-09-29 - **Status:** GA - **Context window:** 128,000 tokens (V3.2-Speciale: 163,840) - **Max output:** 64,000 tokens - **Modalities:** Text in / text out - **Knowledge cutoff:** 2025-07 - **Headline price:** $0.252 in / $0.378 out per 1M tokens
| Benchmark | Score | Source |
|---|---|---|
| Humanity's Last Exam | 30.6% | api-docs.deepseek.com 2025-12-01T00:00:00.000Z |
| MMLU-Pro | 85% | openrouter.ai 2025-12-01T00:00:00.000Z |
| AIME 2025 | 93.1% | api-docs.deepseek.com 2025-12-01T00:00:00.000Z |
| LiveCodeBench | 74.1% | macaron.im 2026-04-24T00:00:00.000Z |
| SWE-bench Verified | 67.8% | macaron.im 2026-04-24T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“When I need a stable, board-defensible DeepSeek today, it's V3.2 — GA, open weights, and proven in production.”
V3.2 was the production-stable bet for cost-conscious teams from late 2025 through Q1 2026, and even with V4 in preview it remains the rational GA choice for anyone who prioritizes guaranteed stability over preview-tier risk. DSA made long-context inference economical on V3-class hardware, and the math/reasoning gains made it competitive with US frontier on a meaningful slice of work. The sovereignty calculus is unchanged — fine for non-regulated workloads, off the table for many enterprise buyers without a neutral inference partner — but the mature open-weights status and broad third-party hosting make an in-boundary deployment the most turnkey in the DeepSeek lineup.
“V3.2 is the model that crystallized the DeepSeek price thesis — the sub-3-cent input cut put the whole industry on notice.”
Strategically, V3.2 is the consolidation release: it took V3.1's hybrid design, added DSA efficiency and gold-medal math, and cut price hard enough that the under-3-cents-per-million input headline became a category event. Its positioning is "the stable, cheap, reasoning-strong open-weights default," and through early 2026 it largely owned that slot against Qwen and Llama. The competitive risk now is internal cannibalization — V4-Flash undercuts it on context and price — so its strategic role is shifting from frontier wedge to dependable GA backbone. The math/Speciale angle remains a genuine differentiator in STEM-heavy markets.
“Sub-3-cent cache-hit input was the move that made the unit economics undeniable — V3.2 sits ~10x below the US frontier.”
V3.2's price drop crystallized the DeepSeek thesis going into 2026. At $0.252 input / $0.378 output, it sits roughly 10x below the US frontier for comparable answer quality on most non-coding tasks, and cache-hit input at $0.025/M is transformative for RAG-style repeated-context workloads. As a GA model the pricing is stable and predictable — no preview volatility — and the open-weights option caps provider-price exposure. The unhedged line item remains geopolitical, but the mature third-party hosting ecosystem makes a compliant deployment cheap to stand up. On dollars-per-quality for math and reasoning, it is among the best values available.
“The mature one in the lineup — tool calls, JSON mode, exposed traces all behave, and the 64K output finally fits long generations.”
For a builder, V3.2 is the most settled DeepSeek model. Tool calling, JSON mode, and structured output all behave well; exposed reasoning content via the reasoner path is easy to debug. Open weights make self-hosted realistic at 671B/37B, and the OpenAI-compatible endpoint makes migration trivial. The 64K output ceiling (up from V3.1's 8K) removes most long-form chunking friction. Documentation is solid if still Chinese-first in spots. There is no batch API. For teams that standardized on V3.2 in Q4 2025, there is no urgent reason to jump to V4 preview unless context window or coding ceiling forces the issue.
“On math and STEM it genuinely impresses; for everyday chat it's a competent, free, slightly-formal workhorse.”
End users on a V3.2-backed product won't notice a gap versus free ChatGPT or Claude on everyday tasks, and on math and STEM the model is actually a strong choice — the reasoning mode solves hard problems that free Western tiers struggle with. Latency is competitive in non-thinking mode and adds a few seconds for reasoning queries. Refusal rate is lower than Western models on most topics, with PRC-aligned guardrails on a narrow set. Helpfulness is high; tone is competent if slightly more formal than Claude. As a free option via the DeepSeek UI, the everyday value is strong.
“The math scores are real and impressive — but 'Speciale gold medals' is a separate high-compute variant, not the model you call by default.”
V3.2's reasoning and math gains are well-documented and the price is verifiable, so the core story holds. The honest caveats: the gold-medal headlines belong to V3.2-Speciale, a high-compute API-only variant, not the base V3.2 most users hit — conflating them overstates the default experience. Coding is a real weak spot (SWE-bench ~67.8), well behind the frontier. The 128K window is now a generation old. And the family-wide governance issues — PRC storage, trains-on-input, no opt-out — apply. None of this undermines V3.2 as an excellent-value, stable, math-strong model; it just means matching the variant and benchmark to the actual deployment.
- **Math, competitive programming, and STEM-heavy reasoning** agents where the Speciale-line capability shines. - **Cost-sensitive RAG and document-analysis** pipelines that fit inside 128K, leveraging DSA economics. - **GA-stable production workloads** that cannot take on V4 preview risk. - **Self-hosted open-weights deployments** where V4's 1.6T footprint is unaffordable but 671B/37B is feasible.
GA stability. V4 is preview with shifting rate limits; V3.2 is production-proven with steady limits and broad third-party hosting. Choose V3.2 when reliability outweighs the 1M context and higher SWE-bench of V4.
DeepSeek Sparse Attention selects a sparse subset of key-value positions per query, cutting long-context training and inference cost on V3-class hardware without quality loss — it is why V3.2 stays cheap on long inputs.
Yes — AIME 2025 93.1% is strong, and the Speciale variant reached gold-medal level on IMO/IOI/ICPC. Note Speciale is a separate high-compute variant.
Yes — 671B/37B MoE under MIT, realistically an 8x H200-class node at FP8, with INT4/GGUF community quants for smaller rigs.
Very — $0.252/M input with cache hits at $0.025/M, plus DSA efficiency, make repeated 128K passes economical.
No — text-only. For document/OCR pair it with DeepSeek-VL2 or a dedicated VL model.
Last verified 2026-05-27