DeepSeek R1 (0528) vs QwQ-32B vs Magistral Medium 1.2 vs Grok 4.20

Best cell per row highlighted. Null means undisclosed — never counted as zero.

DeepSeek
DeepSeek R1 (0528)GA
8.0
AI Panel
Alibaba Cloud
QwQ-32BGA
6.8
AI Panel
Mistral AI
Magistral Medium 1.2GA
7.2
AI Panel
xAI
Grok 4.20GA
6.9
AI Panel

Identity & lifecycle

ProviderDeepSeekAlibaba CloudMistral AIxAI
Family / tierDeepSeek R1 · ReasoningQwQ · ReasoningMagistral · ReasoningGrok 4 · Reasoning
StatusGAGAGAGA
Released2025-05-272025-03-042025-09-172026-03-09
Knowledge cutoff2025-042024-092025-062024-11

Architecture & context

Context window128K131K131K1M
Max output tokens64K33K41K
Input modalitiestexttexttext, imagetext, image
Reasoning modealwaysalwaysalwaysoptional
Open weightsYesYesNoNo

Pricing (per Mtok)

Input$0.55$0.12$2$1.25
Output$2.19$0.18$5$2.5
Cached input$0.14$0.2
Batch input$1
Free tierYesYesYesYes

Speed

Output speed (tok/s)38.9171.4
Time to first token (s)1.713.24
Latency tierslowslowslowmedium

Trust & deployment

Trains on inputsYesNoNoYes
LicenseMITApache-2.0ProprietaryProprietary
CloudsGCPAzure AI Foundry
ComplianceSOC2, ISO27001, GDPR

AI Panel scoring

Unified score86.87.26.9
Decision Maker8777
Domain Strategist8.576.56.5
Finance Lead9776.5
Domain Practitioner8.57.57.57.5
Power User867.56.5
Skeptic7.56.56.56
Value score9.2877

Benchmarks

MMLU93.4
MMLU-Pro85
GPQA Diamond8165.276.2678.5
AIME 202587.583.48
MATH-50090.687.3
SWE-bench Verified57.6
LiveCodeBench73.363.4
Aider Polyglot71.6
MMMU70
IFEval83.981
TAU-bench63.993
SimpleQA27.8
Humanity's Last Exam17.7
LMArena Elo13821491
Artificial Analysis Index682749

Scores link to their sources. Missing cells mean the vendor hasn't disclosed a result — honesty over padding.