QwQ-32B vs Grok 4.20

Best cell per row highlighted. Null means undisclosed — never counted as zero.

Add model:

Alibaba Cloud

6.8

AI Panel

xAI

6.9

AI Panel

Identity & lifecycle

Provider	Alibaba Cloud	xAI
Family / tier	QwQ · Reasoning	Grok 4 · Reasoning
Status	GA	GA
Released	2025-03-04	2026-03-09
Knowledge cutoff	2024-09	2024-11

Architecture & context

Context window	131K	1M
Max output tokens	33K	—
Input modalities	text	text, image
Reasoning mode	always	optional
Open weights	Yes	No

Pricing (per Mtok)

Input	$0.12	$1.25
Output	$0.18	$2.5
Cached input	—	$0.2
Batch input	—	—
Free tier	Yes	Yes

Speed

Output speed (tok/s)	—	171.4
Time to first token (s)	—	13.24
Latency tier	slow	medium

Trust & deployment

Trains on inputs	No	Yes
License	Apache-2.0	Proprietary
Clouds	GCP	—
Compliance	—	—

AI Panel scoring

Unified score	6.8	6.9
Decision Maker	7	7
Domain Strategist	7	6.5
Finance Lead	7	6.5
Domain Practitioner	7.5	7.5
Power User	6	6.5
Skeptic	6.5	6
Value score	8	7

Benchmarks

GPQA Diamond	65.2	78.5
MATH-500	90.6	87.3
LiveCodeBench	63.4	—
IFEval	83.9	81
TAU-bench	—	93
LMArena Elo	—	1491
Artificial Analysis Index	—	49

Scores link to their sources. Missing cells mean the vendor hasn't disclosed a result — honesty over padding.