DeepSeek R1 (0528) vs QwQ-32B vs Grok 4.20

Best cell per row highlighted. Null means undisclosed — never counted as zero.

DeepSeek
DeepSeek R1 (0528)GA
8.0
AI Panel
Alibaba Cloud
QwQ-32BGA
6.8
AI Panel
xAI
Grok 4.20GA
6.9
AI Panel

Identity & lifecycle

ProviderDeepSeekAlibaba CloudxAI
Family / tierDeepSeek R1 · ReasoningQwQ · ReasoningGrok 4 · Reasoning
StatusGAGAGA
Released2025-05-272025-03-042026-03-09
Knowledge cutoff2025-042024-092024-11

Architecture & context

Context window128K131K1M
Max output tokens64K33K
Input modalitiestexttexttext, image
Reasoning modealwaysalwaysoptional
Open weightsYesYesNo

Pricing (per Mtok)

Input$0.55$0.12$1.25
Output$2.19$0.18$2.5
Cached input$0.14$0.2
Batch input
Free tierYesYesYes

Speed

Output speed (tok/s)171.4
Time to first token (s)13.24
Latency tierslowslowmedium

Trust & deployment

Trains on inputsYesNoYes
LicenseMITApache-2.0Proprietary
CloudsGCP
Compliance

AI Panel scoring

Unified score86.86.9
Decision Maker877
Domain Strategist8.576.5
Finance Lead976.5
Domain Practitioner8.57.57.5
Power User866.5
Skeptic7.56.56
Value score9.287

Benchmarks

MMLU93.4
MMLU-Pro85
GPQA Diamond8165.278.5
AIME 202587.5
MATH-50090.687.3
SWE-bench Verified57.6
LiveCodeBench73.363.4
Aider Polyglot71.6
IFEval83.981
TAU-bench63.993
SimpleQA27.8
Humanity's Last Exam17.7
LMArena Elo13821491
Artificial Analysis Index6849

Scores link to their sources. Missing cells mean the vendor hasn't disclosed a result — honesty over padding.