Grok 4.3

GALatest Flagship

by xAI · Grok 4 family · best for real-time research and agentic work at frontier-class value

FrontierReasoningMultimodalLong-Context
7.7
AI Panel Score
Value 8.5/10

Grok 4.3 is xAI's current flagship model, released to the public API on 2026-04-30 (beta on grok.com 2026-04-17). It is a reasoning-first, natively multimodal model — text, image, and native video input — with a 1,000,000-token context window and xAI's signature live access to the X (Twitter) corpus. The single sentence a buyer needs: it delivers frontier-class reasoning and the best real-time-data capability on the market at a price (just $1.25 in / $2.50 out per 1M tokens) that undercuts every comparable flagship by a wide margin — with the trade-off of slow first-token latency and unusually thin published benchmarks. Provider: xAI. Released: 2026-04-30. Status: GA. Context: 1M tokens. Max output: undisclosed. Modalities: text + image + video in, text out. Knowledge cutoff: December 2025. Headline price: $1.25 / $2.50 per 1M tokens.

What's new

  • Versus the prior Grok 4.20 flagship in the same family:
  • **Native video input** — an xAI first: mp4/mov/webm up to ~5 minutes at 1080p, with speech transcription, speaker segmentation, object tracking, and motion-causality understanding.
  • **Newer knowledge** — training cutoff moved to December 2025 (Grok 4.20 was November 2024).
  • **Large agentic gain** — GDPval-AA tool-use Elo of 1500, a +321 jump over Grok 4.20's 1179, per Artificial Analysis. tau-2-Bench Telecom rose ~5 points to 98%.
  • **Aggressive price cut** — at launch xAI cut input ~37.5% and output ~58.3% versus Grok 4.20's original $2/$6 card; both now sit at $1.25/$2.50 with cached input at $0.20.
  • **Intelligence Index 53** — +4 over the latest Grok 4.20 (49) on Artificial Analysis's v4.0 index.
  • **Becomes the default redirect target** — after the 2026-05-15 retirement of eight legacy slugs (grok-3, grok-4-0709, the grok-4-fast and grok-4-1-fast reasoning/non-reasoning pairs, grok-imagine-image-pro), reasoning requests resolve to Grok 4.3 at low effort and non-reasoning requests at none effort, billed at 4.3 pricing.
  • **Trade-off:** context dropped from Grok 4.20's headline 2M to 1M, and AA-Omniscience non-hallucination fell ~8 points (Grok 4.20 still leads that single metric).

Benchmarks

BenchmarkScoreSource
IFEval81%Artificial Analysis (IFBench, held flat vs 4.20)2026-04-30T00:00:00.000Z
TAU-bench98%Artificial Analysis (tau-2-Bench Telecom)2026-04-30T00:00:00.000Z
GPQA Diamond90%Artificial Analysis / xAI launch coverage (approximate)2026-04-30T00:00:00.000Z
Artificial Analysis Index53artificialanalysis.ai 2026-05-28T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker7.5/10
Frontier reasoning at a fifth of Opus pricing, with a live-data moat — but I'm buying it as a second vendor, not my safety-critical default.

For a buyer, Grok 4.3 is the clearest value play among flagships: frontier-band reasoning, a genuine real-time-data moat, and Azure AI Foundry availability that adds enterprise controls xAI's own API lacks. The vendor risk is real — thin benchmark disclosure, no published safety framework, no verified compliance certs, and a moderation posture shaped by public controversy. OpenAI/Anthropic-SDK compatibility keeps lock-in low, so it slots cleanly as a cost-optimized or research-focused second vendor. Roadmap confidence is moderate: xAI ships fast (four flagships in a year) but reprices and retires slugs aggressively, which raises operational churn.

Strategic Fit 8Vendor Risk 6Roadmap Confidence 7
Pros
  • Frontier value
  • real-time moat
  • Azure path for governance
Cons
  • Thin disclosure
  • no certs
  • slug churn
Right for: Cost-aware teams wanting a research/agentic second vendor
Avoid if: You need certified compliance or a single audited primary model
Domain Strategist8/10
Grok's moat isn't the benchmark — it's the X firehose and the Musk-ecosystem distribution that no rival can clone.

In market terms, Grok 4.3's defensibility comes from data and distribution, not raw IQ. Native access to the live X corpus is a moat Anthropic, OpenAI, and Google cannot replicate without owning a social graph, and the Tesla / SpaceX / X ecosystem gives xAI built-in surfaces (in-car assistant, X Premium, grok.com) that competitors must win through partnerships. The "edgier" positioning differentiates against the homogeneously-corporate voices of ChatGPT and Claude. Market timing is strong: shipping the cheapest frontier flagship during a price war pressures rivals directly. The weakness is that the ecosystem narrative (Tesla, SpaceX) is still more story than shipped product outside X itself.

Competitive Positioning 8Differentiation 9Market Timing 8
Pros
  • Unclonable live-data moat
  • ecosystem surfaces
  • distinct voice
Cons
  • Tesla/SpaceX integration mostly marketing
  • brand-risk overhang
Right for: Social-listening, real-time, sentiment-driven products
Avoid if: Your buyers are conservative enterprises wary of the brand
Finance Lead8.5/10
At $1.25 in / $2.50 out with 84% cache reads, this is frontier reasoning at clearance pricing — just watch the long-context double-rate.

The unit economics are excellent. Grok 4.3 undercuts Claude Opus 4.7 by roughly 5–10x on token price and beats GPT-5.5 by a wide margin while sitting in the frontier reasoning band on the AA Index. The 84% cached-input discount makes iterative and agentic workloads cheap. Two cost traps to model: the long-context tier doubles input/output above 200K tokens, which can quietly inflate large-document bills, and always-on reasoning produces ~44% more output tokens than Grok 4.20, inflating per-call output cost. For batch, cached, and well-scoped workloads, value-per-dollar is best-in-class; for ad-hoc long-doc work, model the bill carefully.

Cost Efficiency 9Pricing Transparency 8Value per Dollar 9
Pros
  • Clearance-level frontier pricing
  • deep cache discount
Cons
  • Long-context double-rate
  • verbose output cost
Right for: High-volume agentic/research at frontier quality
Avoid if: Unpredictable >200K-token jobs you can't pre-model
Domain Practitioner7.5/10
OpenAI/Anthropic-SDK compatible, clean function calling, one-line X search — but budget for ~20s first tokens and thinner docs.

For builders, adoption is frictionless: drop-in OpenAI- or Anthropic-SDK compatibility, working structured outputs, parallel tool calls, and the standout X-search tool that pulls live posts into a prompt with one call. The 1M window removes a lot of RAG plumbing. The friction is real: ~19.7s time-to-first-token feels slow in interactive loops, the SDK/docs ecosystem still trails OpenAI and Anthropic in polish and examples, and reasoning visibility is summary-only. Coding-agent work is better served elsewhere given the soft coding profile. Reliability is decent but rate limits are spend-tiered rather than transparently fixed.

API Ergonomics 8Tool/Agent Support 8Reliability 7
Pros
  • SDK compatibility
  • native X-search tool
  • big context
Cons
  • Slow TTFT
  • thinner docs
  • weak coding
Right for: Research/agentic apps that benefit from live data
Avoid if: You need low-latency UX or a coding-agent backbone
Power User7.5/10
It actually has opinions and knows what happened this morning — but you wait for the first token and the guardrails feel inconsistent.

For a heavy daily user, Grok via grok.com / X Premium feels distinct from ChatGPT or Claude: looser personality, readier opinions, and willingness to engage with topics other assistants deflect. The live-X integration makes it the natural choice for current-events and trend questions. Downsides are well known: long TTFT feels sluggish next to GPT-5.5 or Gemini 3.1, output quality occasionally suffers from the loose guardrails, the consumer tier maze (X Premium $8, SuperGrok Lite $10, SuperGrok $30, X Premium+ $40, Heavy $300) is confusing, and image/NSFW moderation is now aggressive and inconsistent. For users who already live on X, bundled access is strong value.

Output Quality 7Speed 6Everyday Usefulness 8
Pros
  • Real-time answers
  • personality
  • X bundle value
Cons
  • Slow first token
  • inconsistent moderation
  • tier confusion
Right for: X-native users who want current, opinionated answers
Avoid if: You want fast, predictable, heavily-guardrailed responses
Skeptic6.5/10
No SWE-bench number, no architecture, no safety framework — a 53 Intelligence Index and a firehose don't make a frontier leader.

Adversarially, Grok 4.3's marketing outpaces its disclosure. The conspicuously-absent SWE-bench Verified figure reads as an omission of a weak result; independent coverage confirms it trails Opus 4.7 by double digits on hard coding. The AA Index of 53 is good-not-leading, below GPT-5.5 and Gemini 3.1 Pro. Architecture is undisclosed, there is no published safety framework, no verified compliance, and AA-Omniscience non-hallucination dropped 8 points versus 4.20. The real-time-data moat is genuine and the price is genuinely excellent — those claims hold. But "fastest, most intelligent model we've built" is vendor framing the standalone benchmarks don't fully substantiate, and the moderation history (Jan 2026 imagery scandal) is a live reputational risk.

Claim Accuracy 6Weakness Severity 6Hype vs Reality 6
Pros
  • Real-time moat and price are real
Cons
  • Hidden coding weakness
  • zero architecture transparency
  • moderation scandal
Right for: Buyers who verify with their own evals
Avoid if: You take vendor "most intelligent" claims at face value

Strengths

  • Best price-to-frontier-quality ratio of any flagship in May 2026 ($1.25 / $2.50, with 84% cached discount).
  • Real-time X + web data access — a structural moat no competitor matches natively (cap_realtime_data 10/10).
  • Native video input is unique at this tier.
  • Top-end agentic / tool-use Elo (GDPval-AA 1500).
  • OpenAI/Anthropic-SDK compatibility plus Azure AI Foundry availability lower adoption and lock-in risk.

Limitations

  • ~19.7s time-to-first-token rules it out as a default for interactive, sub-second UX.
  • Trails Claude Opus 4.7 and GPT-5.5 on hard coding; no published SWE-bench Verified figure.
  • Thin benchmark transparency — most vendor positioning is unverifiable against standalone scores.
  • Long-context double-rate above 200K input tokens quietly inflates large-document bills; verbose outputs add output-token cost.
  • Weaker, deliberately-lower-refusal safety calibration and no verified compliance certifications on the direct API.
  • AA-Omniscience non-hallucination dropped ~8 points vs Grok 4.20.

Best use cases

- **Real-time research and news synthesis.** Native live-X access makes Grok 4.3 the obvious pick for "what's happening now," sentiment, and breaking-event analysis. - **High-volume agentic and tool-use pipelines.** Frontier-class agentic Elo at a fraction of Opus/GPT-5.5 token cost makes price-per-task compelling. - **Long-document and multimodal analysis** including short-clip video understanding inside one model, where sub-second latency is not required. - **Cost-optimized frontier reasoning** as a second vendor where Opus 4.7 economics would be prohibitive.

Buyer questions

How much does Grok 4.3 cost?

$1.25 per 1M input tokens and $2.50 per 1M output tokens, with cached input at $0.20 (84% off). Above 200K input tokens, rates double. Consumer access starts at X Premium $8/mo or SuperGrok $30/mo.

Is it good for coding?

It is the weakest area. xAI did not publish a SWE-bench Verified score for 4.3, and it trails Claude Opus 4.7 by double digits on hard agentic coding. For a coding agent, prefer Grok Build 0.1 within xAI or Claude/Codex elsewhere.

What makes it different?

Native, first-party access to the live X (Twitter) firehose plus web search, and native video input — neither of which competing flagships offer natively.

Can I get it with enterprise controls?

Yes, via Microsoft Foundry / Azure AI Foundry (RBAC, private networking, customer-managed keys), though Azure caps context at ~200K tokens versus 1M on the direct API.

Does xAI train on my data?

On the API, only if you opt into data sharing (irreversible once enabled). On the X consumer surface, the 2026 ToS trains on prompts/outputs by default with no opt-out.

Why is first response so slow?

Reasoning is always on (effort dial only, no off switch), so time-to-first-token is ~20s. It is built for quality, not interactive latency.

How do I migrate from a retired Grok slug?

Retired slugs (grok-3, grok-4-0709, the fast pairs) auto-redirect to Grok 4.3 and bill at 4.3 pricing; reasoning maps to low effort, non-reasoning to none.

Comparable models

**Claude Opus 4.7** — Wins decisively on hard coding (SWE-bench ~87.6% vs Grok's unpublished/weak), reasoning ceiling, and published safety; loses on price (5–10x more) and has no real-time data access.
**GPT-5.5** — Higher AA Intelligence Index (~60) and far broader benchmark transparency; loses on price and on native real-time data.
**Gemini 3.1 Pro** — Higher AA Index (~57), stronger multimodal breadth, comparable-to-larger context; loses on real-time X access and on token price.

Model specs

Input price
$1.25 / Mtok
Output price
$2.50 / Mtok
Cached input
$0.20 / Mtok
Batch (in/out)
Context window
1M tokens
Max output
— tokens
Knowledge cutoff
2025-12
Released
2026-04-29
Modalities
text, image, video → text
Output speed
~181.5 tok/s
License
Proprietary
Clouds
Azure AI Foundry

Last verified 2026-05-27