by xAI · Grok Build family · best for local-first coding agent for IP-sensitive teams
Grok Build 0.1 is xAI's first dedicated agentic coding model and CLI, launched 2026-05-14 (API GA 2026-05-20, slug `grok-build-0.1`). It competes directly with Claude Code and OpenAI Codex CLI. Its defining traits: a local-first architecture (source code never leaves the developer's machine), plan mode (an approval-gated plan before execution), up to 8 parallel sub-agents running in Git worktrees, native Model Context Protocol (MCP) support, and a low API price of $1.00 in / $2.00 out per 1M tokens. The single sentence a buyer needs: it is a credible, cheap, privacy-forward coding agent whose 70.8% SWE-bench Verified trails Claude — a strong second tool for IP-sensitive and cost-sensitive teams, not yet a top-tier-quality replacement. Provider: xAI. Released: 2026-05-20. Status: GA. Context: 256K tokens. Max output: 256K tokens. Modalities: text + image in, text out. Knowledge cutoff: November 2024. Headline price: $1.00 / $2.00 per 1M tokens.
| Benchmark | Score | Source |
|---|---|---|
| SWE-bench Verified | 70.8% | xAI internal harness (inherited from grok-code-fast-1, which redirects to Grok Build 0.1)2026-05-20T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“The local-first design answers my single biggest objection to coding agents — but at v0.1 it's a second tool, not a Claude Code replacement.”
Strategically, Grok Build 0.1 directly targets the top CTO objection to coding agents: shipping proprietary source to a cloud LLM. The local-first architecture makes it adoptable in regulated and IP-sensitive environments where Claude Code's cloud model is a non-starter. But the 70.8% SWE-bench Verified is competitive-not-leading, and v0.1 maturity means fewer integrations and rough edges. The sensible posture is "second tool in the belt": use it for codebases that legally can't go to the cloud, keep Claude Code as primary elsewhere. Vendor risk is the usual xAI profile (thin disclosure, no certs); lock-in is low via OpenAI-compatible API. The X-ecosystem angle does not meaningfully extend to coding.
“xAI is now competing across the full developer-tools surface — and local-first is a sharp wedge into the data-sovereignty segment Claude can't easily match.”
In dev-tools market terms, Grok Build signals xAI competing beyond chat into the coding-agent category Anthropic has led for a year. The differentiation is local-first data sovereignty — a real wedge into regulated industries and security-conscious enterprises that cloud-only agents structurally can't serve without re-architecting. Native MCP support rides the open-tooling wave. Market timing is late (Claude Code and Codex are entrenched) but the privacy angle opens a segment competitors haven't prioritized. The weakness is the v0.1 reality: a sharp positioning story needs the product maturity to back it, and right now the integrations and benchmark proof lag the narrative.
“At $1/$2 it's a fraction of Claude Sonnet or Opus per token — the catch is retries on hard tasks can eat the savings.”
The unit economics are strong. At $1.00 / $2.00 per 1M tokens, Grok Build is materially cheaper than Claude Sonnet 4.6 ($3/$15) and roughly 10x cheaper than Opus 4.7 per token — meaningful for coding agents that burn tens of millions of output tokens per project. The CLI bundled into SuperGrok ($30/mo) or X Premium+ ($40/mo) undercuts stacking Cursor + Claude Code subscriptions. The financial caveat: lower benchmark quality means more retries and more developer-supervision time on hard problems, which can quietly erode the per-token savings. For routine refactors and well-scoped tasks, the economics clearly favor Grok Build; for hard, novel problems, factor in the human-supervision overhead.
“Plan mode and 8 worktree sub-agents are a genuine win — the CLI just feels less polished than Claude Code, and 256K bites on big repos.”
For builders, the standout UX is plan mode (approve a step list before any mutation) and parallel sub-agents isolated in Git worktrees, which make multi-file refactors fast and reviewable. Native MCP support plugs into the growing tool ecosystem, and function calling / structured outputs are solid via the OpenAI-compatible interface. The friction: the CLI is less polished than Claude Code — fewer editor integrations and plugins — always-on reasoning means waiting on each action, and the 256K context forces manual context selection on real monorepos that a 1M-context tool avoids. For well-scoped tasks and privacy-bound work it's productive; for sprawling repos it asks more of the developer.
“As a daily coding driver it's serviceable and cheap, but I still reach for Claude Code on anything hard — and I feel the reasoning pause.”
For a developer living in the CLI day to day, Grok Build is serviceable and notably cheap, with plan mode adding a reassuring approval step before edits. On routine tasks it ships fine. But on hard problems the quality gap to Claude shows, the always-on reasoning pause on every action is noticeable, and the smaller context means more time spent curating what the agent sees. The local-first guarantee is a quiet daily comfort for anyone working on proprietary code. As an everyday tool it earns a spot in the rotation; as the single tool for all coding, most heavy users will still keep Claude Code alongside it.
“One self-reported SWE-bench number on xAI's own harness, no other benchmarks, v0.1 — the local-first story is real; the quality story is unproven.”
Adversarially, Grok Build 0.1's evidence base is thin even by xAI standards: a single headline benchmark (70.8% SWE-bench Verified) measured on xAI's own internal harness rather than an independent run, and aggregators showing zero sourced evals. The 70.8% also trails Claude Sonnet, so the one number on offer is not a winning one. v0.1 versioning is honest signaling that this is early. The local-first claim is the strongest part of the pitch and appears genuine — code staying off-server is a real, checkable property. But "credible Claude Code competitor" rests largely on positioning, not on independently verified coding quality, and the November 2024 cutoff is a real handicap for current frameworks.
- **Regulated / IP-sensitive teams** that legally cannot ship source code to a cloud LLM — the local-first design is genuinely differentiating. - **Cost-sensitive coding-agent workloads** where price-per-task beats ceiling quality. - **Parallel multi-file refactors** where up to 8 sub-agents in Git worktrees speed changes. - **Plan-mode workflows** where developers want explicit approval before any file mutation. - **xAI / X ecosystem teams** preferring a single vendor for chat and code.
$1.00 in / $2.00 out per 1M tokens (cached $0.20) on the API; the CLI is bundled into SuperGrok ($30/mo) or X Premium+ ($40/mo), with heavy tiers around $300/mo. Some trackers show $0.20/$1.50 — trust docs.x.ai.
No. Grok Build is local-first: source code is not transmitted to xAI's servers. That is its main differentiator and the basis for treating it as privacy-forward.
Lower. SWE-bench Verified 70.8% trails Claude Sonnet 4.6 (72.7%) and is far below Opus 4.7 (87.6%). Strong for scoped tasks, not for top-tier ceilings.
The agent proposes a step list and waits for your approval before mutating files — a safety gate against runaway edits.
With effort. 256K context is the smallest in the Grok lineup, so monorepos need disciplined context selection; a 1M-context tool is easier there.
Native MCP support, the x.ai API (OpenAI/Anthropic-SDK compatible), OpenRouter, and third-party surfaces like Kilo Code. Editor-plugin coverage is thinner than Claude Code at v0.1.
It was retired 2026-05-15 and now redirects to Grok Build 0.1; the 70.8% SWE-bench figure traces to that lineage.
Does not train on API inputs by default
Last verified 2026-05-27