What does Grok 4.20 cost now?

xAI docs list $1.25 / $2.50 / $0.20 cached — repriced down from the launch $2 / $6. Artificial Analysis still shows the old $2 / $6 with $1.10 cached; trust docs.x.ai.

Is the context 1M or 2M?

xAI's docs card lists 1M for the reasoning/non-reasoning slugs; AA and OpenRouter still show 2M, and the multi-agent sibling is 2M. Verify against your account's live limits.

How is the reasoning toggle used?

Either pick the -reasoning vs -non-reasoning slug, or set the reasoning-effort parameter — one integration, two modes.

Should I use 4.20 or 4.3?

For almost everything, 4.3: same price, newer training, video, higher scores. Keep 4.20 only for its non-hallucination edge or if you're pinned for stability.

Does xAI train on my data?

API: only via irreversible opt-in data sharing. X consumer surface: by default, no opt-out.

Is it certified for enterprise compliance?

No SOC2/HIPAA/ISO certs are publicly verified on the direct API; route via a managed cloud if you need them.

Grok 4.20 Review — Benchmarks, Pricing & AI Panel Verdict

Benchmark	Score	Source
IFEval	81%	Artificial Analysis (IFBench)2026-04-30T00:00:00.000Z
MATH-500	87.3%	xAI launch / secondary coverage2026-03-10T00:00:00.000Z
TAU-bench	93%	Artificial Analysis (tau-2-Bench Telecom; ~5pts below 4.3's 98)2026-04-30T00:00:00.000Z
LMArena Elo	1491	LMArena / LMSYS (grok-4.20-beta1, Mar-Apr 2026, top-4; +31 May mover)2026-04-30T00:00:00.000Z
GPQA Diamond	78.5%	xAI launch / secondary coverage2026-03-10T00:00:00.000Z
Artificial Analysis Index	49	artificialanalysis.ai 2026-05-28T00:00:00.000Z

Architecture

As with all of the Grok line, xAI discloses essentially nothing about internals — architecture type, parameters, attention, layers, training tokens, and compute are all undisclosed and marked null/unknown. What is documented is behavioral: a switchable reasoning mode (the headline architectural-level feature), a 1M-token context on the current docs card, native image input, and a November 2024 knowledge cutoff. The reasoning toggle is the genuinely distinctive design choice here — most reasoning models from peers are either always-on or a separate SKU, whereas Grok 4.20 exposes both modes behind one model family.

Capabilities

Reasoning (7.5): With reasoning on, competitive with o-series reasoning models and Claude Sonnet on math, science, and analysis; AA Intelligence Index of 49 sits just below Grok 4.3's 53.
Math (7.5): MATH-500 ~87.3% and GPQA Diamond ~78.5% at release; solid, not leading.
Agentic / tool use (7.0): GDPval-AA Elo of 1179 — mid-pack, and the area Grok 4.3 improved most (to 1500). Native web + X search tools.
Long context (8.0): 1M on the current docs card (2M on AA/OpenRouter and on the multi-agent sibling) — strong for long-document and full-codebase work.
Instruction following (8.5): A real highlight — strict prompt adherence reduces output-parsing defensiveness in production; IFBench ~81%.
Safety calibration (6.5): Best-in-class non-hallucination at release (AA-Omniscience 78%) is a genuine calibration strength, offset by the deliberately lower text refusal rate.
Coding (6.0): Lags Claude Sonnet / Opus; no published SWE-bench Verified figure.
Vision / OCR (6.5 / 6.5): Competent image input; no video (that arrived with 4.3).
Real-time data (9.5): Native live X + web access, marginally behind 4.3 only because of the older November 2024 cutoff for parametric knowledge.

Benchmark analysis

Benchmark	Score	vs Predecessor	vs Top Competitor	Source
Artificial Analysis Intelligence Index	49	Up from Grok 4 (~42)	Below GPT-5.5 (~60), Gemini 3.1 Pro (~57); below Grok 4.3 (53)	Artificial Analysis
GPQA Diamond	~78.5%	Up from Grok 4 ceiling	Behind Claude Opus 4.7 / GPT-5.5	Launch coverage
MATH-500	~87.3%	Improved	Competitive	Launch coverage
AA-Omniscience (non-hallucination)	78%	Best in class at release	Best in class at release; still leads vs Grok 4.3	Digital Applied
GDPval-AA (tool-use Elo)	1179	Baseline	Mid-pack (Grok 4.3 reaches 1500)	AA launch article
IFBench	81%	n/a	Competitive	Artificial Analysis
LMArena Elo (beta1 proxy)	~1491	Up (+31 May mover, top-4)	Behind Claude Opus 4.6 (~1504)	LMArena tracker

(MMLU-Pro, AIME 2025, SWE-bench Verified, HumanEval, LiveCodeBench, Aider Polyglot, and MMMU were not consistently published by xAI for Grok 4.20; rows left null. Coverage is thinner than peer models.)

Speed & latency

Artificial Analysis measures ~171.4 output tokens/sec with a ~13.24s time-to-first-token — notably faster to first token than Grok 4.3's ~19.7s, because in non-maximal reasoning modes there is less upfront thinking. In non-reasoning mode it behaves as a fast conversational model. Overall latency tier: medium. The reasoning toggle is the practical lever — flip it off for speed, on for quality, without swapping models.

Pricing analysis

Surface	Cost	Notes
API input	$1.25 / 1M tok	docs.x.ai canonical (repriced from $2.00)
API output	$2.50 / 1M tok	docs.x.ai canonical (was $6.00)
Cached input	$0.20 / 1M tok	docs.x.ai (AA's older card shows $1.10)
Batch	n/a	No documented batch endpoint
Direct UI (SuperGrok)	$30 / mo	Standalone Grok
Direct UI (X Premium / Premium+)	$8 / $40 mo	Bundled in X app
SuperGrok Heavy	$300 / mo	Power-user tier
Free tier	$0	grok.com + free X with daily caps
Rate limits	tiered by spend/plan	Per docs.x.ai

Pricing conflict (the v1-flagged discrepancy, now characterized): This is the model where xAI docs and third parties diverge most. xAI's docs.x.ai card is canonical at $1.25 / $2.50 / $0.20 cached. Artificial Analysis's Grok 4.20 page still shows the launch-era $2.00 / $6.00 with $1.10 cached and a 2M context; OpenRouter shows $1.25 / $2.50 but 2M context. The v1 file used the AA/launch $2 / $6 figures — corrected here to the canonical docs values, with the conflict noted rather than hidden.

Deployment & access

Proprietary, API-only, no open weights, not self-hostable. OpenAI-SDK and Anthropic-SDK compatible. Resold via OpenRouter. Unlike Grok 4.3, Grok 4.20 is not confirmed as a distinct Azure AI Foundry SKU (Foundry leads with Grok 4 Fast variants and Grok 4.3), so cloud_platforms is empty here. Rate limits are spend/plan-tiered.

Safety & privacy

Same posture as the rest of the Grok line: no published safety framework, governance via Acceptable Use Policy, content moderation tightened after January 2026. Training-on-inputs: API only via irreversible opt-in data sharing; X consumer surface trains by default with no opt-out (trains_on_inputs: true, data_optout_available: false). The distinctive governance positive for 4.20 is its release-time best-in-class non-hallucination rate (AA-Omniscience 78%), which it still leads — relevant for low-error-tolerance summarization (legal, medical) where factual reliability matters more than the latest knowledge. No verified SOC2/HIPAA/ISO/GDPR certs on the direct API.

Ecosystem & tooling

Python/TypeScript SDKs plus OpenAI- and Anthropic-SDK compatibility; LangChain and Vercel AI SDK integrations; resold on OpenRouter. Surfaces: Grok on X, grok.com, SuperGrok. Popularity is growing but actively draining toward Grok 4.3 as the recommended default.

Grok 4.20

What's new

Benchmarks

AI Panel Review

Strengths

Limitations

Best use cases

Deep dive

Architecture

Capabilities

Benchmark analysis

Speed & latency

Pricing analysis

Deployment & access

Safety & privacy

Ecosystem & tooling

Buyer questions

Comparable models

Sources

Model specs

Other Grok 4 versions