DeepSeek R1 (0528) vs QwQ-32B vs Magistral Medium 1.2 vs Grok 4.20

Best cell per row highlighted. Null means undisclosed — never counted as zero.

DeepSeek

DeepSeek R1 (0528)GA

8.0

AI Panel

Alibaba Cloud

6.8

AI Panel

Mistral AI

Magistral Medium 1.2GA

7.2

AI Panel

xAI

6.9

AI Panel

Identity & lifecycle

Provider	DeepSeek	Alibaba Cloud	Mistral AI	xAI
Family / tier	DeepSeek R1 · Reasoning	QwQ · Reasoning	Magistral · Reasoning	Grok 4 · Reasoning
Status	GA	GA	GA	GA
Released	2025-05-27	2025-03-04	2025-09-17	2026-03-09
Knowledge cutoff	2025-04	2024-09	2025-06	2024-11

Architecture & context

Context window	128K	131K	131K	1M
Max output tokens	64K	33K	41K	—
Input modalities	text	text	text, image	text, image
Reasoning mode	always	always	always	optional
Open weights	Yes	Yes	No	No

Pricing (per Mtok)

Input	$0.55	$0.12	$2	$1.25
Output	$2.19	$0.18	$5	$2.5
Cached input	$0.14	—	—	$0.2
Batch input	—	—	$1	—
Free tier	Yes	Yes	Yes	Yes

Speed

Output speed (tok/s)	—	—	38.9	171.4
Time to first token (s)	—	—	1.7	13.24
Latency tier	slow	slow	slow	medium

Trust & deployment

Trains on inputs	Yes	No	No	Yes
License	MIT	Apache-2.0	Proprietary	Proprietary
Clouds	—	GCP	Azure AI Foundry	—
Compliance	—	—	SOC2, ISO27001, GDPR	—

AI Panel scoring

Unified score	8	6.8	7.2	6.9
Decision Maker	8	7	7	7
Domain Strategist	8.5	7	6.5	6.5
Finance Lead	9	7	7	6.5
Domain Practitioner	8.5	7.5	7.5	7.5
Power User	8	6	7.5	6.5
Skeptic	7.5	6.5	6.5	6
Value score	9.2	8	7	7

Benchmarks

MMLU	93.4	—	—	—
MMLU-Pro	85	—	—	—
GPQA Diamond	81	65.2	76.26	78.5
AIME 2025	87.5	—	83.48	—
MATH-500	—	90.6	—	87.3
SWE-bench Verified	57.6	—	—	—
LiveCodeBench	73.3	63.4	—	—
Aider Polyglot	71.6	—	—	—
MMMU	—	—	70	—
IFEval	—	83.9	—	81
TAU-bench	63.9	—	—	93
SimpleQA	27.8	—	—	—
Humanity's Last Exam	17.7	—	—	—
LMArena Elo	1382	—	—	1491
Artificial Analysis Index	68	—	27	49

Scores link to their sources. Missing cells mean the vendor hasn't disclosed a result — honesty over padding.