DeepSeek V3.1 vs DeepSeek V4-Flash vs DeepSeek V4-Pro

Best cell per row highlighted. Null means undisclosed — never counted as zero.

Add model:

DeepSeek

DeepSeek V3.1GA

7.7

AI Panel

DeepSeek

DeepSeek V4-Flashpreview

8.7

AI Panel

DeepSeek

DeepSeek V4-Propreview

8.5

AI Panel

Identity & lifecycle

Provider	DeepSeek	DeepSeek	DeepSeek
Family / tier	DeepSeek V3 · Large	DeepSeek V4 · Flash	DeepSeek V4 · Pro
Status	GA	preview	preview
Released	2025-08-20	2026-04-23	2026-04-23
Knowledge cutoff	2025-06	2026-02	2026-02

Architecture & context

Context window	128K	1M	1M
Max output tokens	8K	384K	384K
Input modalities	text	text	text
Reasoning mode	optional	optional	optional
Open weights	Yes	Yes	Yes

Pricing (per Mtok)

Input	$0.21	$0.14	$0.435
Output	$0.79	$0.28	$0.87
Cached input	$0.021	$0.0028	$0.0036
Batch input	—	—	—
Free tier	Yes	Yes	Yes

Speed

Output speed (tok/s)	—	—	—
Time to first token (s)	—	—	—
Latency tier	medium	fast	medium

Trust & deployment

Trains on inputs	Yes	Yes	Yes
License	MIT	MIT	MIT
Clouds	—	—	—
Compliance	—	—	—

AI Panel scoring

Unified score	7.7	8.7	8.5
Decision Maker	7.5	8.5	8
Domain Strategist	7.5	8.5	8.5
Finance Lead	8.5	9.7	9.5
Domain Practitioner	8	8.5	8.5
Power User	7.5	8	7.5
Skeptic	7.5	7.5	7.5
Value score	8.8	9.9	9.7

Benchmarks

MMLU-Pro	83.7	86.2	87.5
GPQA Diamond	74	88.1	90.1
AIME 2025	88	—	—
SWE-bench Verified	66	79	80.6
HumanEval	—	—	76.8
LiveCodeBench	80	91.6	93.5
Aider Polyglot	76.3	—	—
MRCR Long Context	—	78.7	83.5
SimpleQA	—	34.1	57.9
Humanity's Last Exam	—	34.8	37.7
LMArena Coding Elo	—	—	1287
Artificial Analysis Index	—	—	52

Scores link to their sources. Missing cells mean the vendor hasn't disclosed a result — honesty over padding.