Grok Build 0.1

GALatest Coder

by xAI · Grok Build family · best for local-first coding agent for IP-sensitive teams

Coding
6.7
AI Panel Score
Value 7.5/10

Grok Build 0.1 is xAI's first dedicated agentic coding model and CLI, launched 2026-05-14 (API GA 2026-05-20, slug `grok-build-0.1`). It competes directly with Claude Code and OpenAI Codex CLI. Its defining traits: a local-first architecture (source code never leaves the developer's machine), plan mode (an approval-gated plan before execution), up to 8 parallel sub-agents running in Git worktrees, native Model Context Protocol (MCP) support, and a low API price of $1.00 in / $2.00 out per 1M tokens. The single sentence a buyer needs: it is a credible, cheap, privacy-forward coding agent whose 70.8% SWE-bench Verified trails Claude — a strong second tool for IP-sensitive and cost-sensitive teams, not yet a top-tier-quality replacement. Provider: xAI. Released: 2026-05-20. Status: GA. Context: 256K tokens. Max output: 256K tokens. Modalities: text + image in, text out. Knowledge cutoff: November 2024. Headline price: $1.00 / $2.00 per 1M tokens.

What's new

  • As xAI's first entry in the coding-agent category:
  • **Local-first design** — source code is not transmitted to xAI's servers, a deliberate stance for proprietary codebases and regulated industries (and the basis for `trains_on_inputs: false`).
  • **Plan mode** — the agent presents an approval-gated step list before mutating any files; a real safety gate for autonomous runs.
  • **Up to 8 parallel sub-agents** running in isolated Git worktrees — speeds multi-file refactors.
  • **Native MCP support** — connects to Model Context Protocol tools out of the box.
  • **Always-on built-in reasoning** for reliable multi-step code work.
  • **Absorbs grok-code-fast-1** — that slug was retired 2026-05-15 and now redirects to Grok Build 0.1; the 70.8% SWE-bench Verified figure comes from that lineage on xAI's internal harness.
  • **Cheap entry** — API at $1.00 / $2.00; CLI bundled into SuperGrok / X Premium+, with heavy tiers around $300/mo ($99/mo six-month intro promo reported).

Benchmarks

BenchmarkScoreSource
SWE-bench Verified70.8%xAI internal harness (inherited from grok-code-fast-1, which redirects to Grok Build 0.1)2026-05-20T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker7/10
The local-first design answers my single biggest objection to coding agents — but at v0.1 it's a second tool, not a Claude Code replacement.

Strategically, Grok Build 0.1 directly targets the top CTO objection to coding agents: shipping proprietary source to a cloud LLM. The local-first architecture makes it adoptable in regulated and IP-sensitive environments where Claude Code's cloud model is a non-starter. But the 70.8% SWE-bench Verified is competitive-not-leading, and v0.1 maturity means fewer integrations and rough edges. The sensible posture is "second tool in the belt": use it for codebases that legally can't go to the cloud, keep Claude Code as primary elsewhere. Vendor risk is the usual xAI profile (thin disclosure, no certs); lock-in is low via OpenAI-compatible API. The X-ecosystem angle does not meaningfully extend to coding.

Strategic Fit 7Vendor Risk 6Roadmap Confidence 7
Pros
  • Local-first solves the IP objection
  • cheap
  • plan-mode gate
Cons
  • v0.1 maturity
  • sub-Claude SWE-bench
Right for: Regulated/IP-sensitive teams
Avoid if: You need top-tier coding quality today
Domain Strategist7/10
xAI is now competing across the full developer-tools surface — and local-first is a sharp wedge into the data-sovereignty segment Claude can't easily match.

In dev-tools market terms, Grok Build signals xAI competing beyond chat into the coding-agent category Anthropic has led for a year. The differentiation is local-first data sovereignty — a real wedge into regulated industries and security-conscious enterprises that cloud-only agents structurally can't serve without re-architecting. Native MCP support rides the open-tooling wave. Market timing is late (Claude Code and Codex are entrenched) but the privacy angle opens a segment competitors haven't prioritized. The weakness is the v0.1 reality: a sharp positioning story needs the product maturity to back it, and right now the integrations and benchmark proof lag the narrative.

Competitive Positioning 7Differentiation 8Market Timing 6
Pros
  • Local-first data-sovereignty wedge
  • MCP-native
Cons
  • Late entrant
  • v0.1 maturity
Right for: Data-sovereignty-driven dev orgs
Avoid if: You want the category leader's ecosystem
Finance Lead7.5/10
At $1/$2 it's a fraction of Claude Sonnet or Opus per token — the catch is retries on hard tasks can eat the savings.

The unit economics are strong. At $1.00 / $2.00 per 1M tokens, Grok Build is materially cheaper than Claude Sonnet 4.6 ($3/$15) and roughly 10x cheaper than Opus 4.7 per token — meaningful for coding agents that burn tens of millions of output tokens per project. The CLI bundled into SuperGrok ($30/mo) or X Premium+ ($40/mo) undercuts stacking Cursor + Claude Code subscriptions. The financial caveat: lower benchmark quality means more retries and more developer-supervision time on hard problems, which can quietly erode the per-token savings. For routine refactors and well-scoped tasks, the economics clearly favor Grok Build; for hard, novel problems, factor in the human-supervision overhead.

Cost Efficiency 9Pricing Transparency 7Value per Dollar 8
Pros
  • ~10x cheaper than Opus per token
  • cheap bundled CLI
Cons
  • Retries/supervision can erode savings
  • cross-source price drift
Right for: Cost-sensitive, well-scoped coding work
Avoid if: Hard problems where retries dominate cost
Domain Practitioner7/10
Plan mode and 8 worktree sub-agents are a genuine win — the CLI just feels less polished than Claude Code, and 256K bites on big repos.

For builders, the standout UX is plan mode (approve a step list before any mutation) and parallel sub-agents isolated in Git worktrees, which make multi-file refactors fast and reviewable. Native MCP support plugs into the growing tool ecosystem, and function calling / structured outputs are solid via the OpenAI-compatible interface. The friction: the CLI is less polished than Claude Code — fewer editor integrations and plugins — always-on reasoning means waiting on each action, and the 256K context forces manual context selection on real monorepos that a 1M-context tool avoids. For well-scoped tasks and privacy-bound work it's productive; for sprawling repos it asks more of the developer.

API Ergonomics 7Tool/Agent Support 8Reliability 7
Pros
  • Plan mode + worktree sub-agents
  • MCP-native
  • cheap
Cons
  • Less-polished CLI
  • 256K context
  • slow per-action
Right for: Privacy-bound, well-scoped coding
Avoid if: Large monorepos or you want the most mature CLI
Power User6.5/10
As a daily coding driver it's serviceable and cheap, but I still reach for Claude Code on anything hard — and I feel the reasoning pause.

For a developer living in the CLI day to day, Grok Build is serviceable and notably cheap, with plan mode adding a reassuring approval step before edits. On routine tasks it ships fine. But on hard problems the quality gap to Claude shows, the always-on reasoning pause on every action is noticeable, and the smaller context means more time spent curating what the agent sees. The local-first guarantee is a quiet daily comfort for anyone working on proprietary code. As an everyday tool it earns a spot in the rotation; as the single tool for all coding, most heavy users will still keep Claude Code alongside it.

Output Quality 6Speed 6Everyday Usefulness 7
Pros
  • Cheap daily driver
  • plan-mode safety
  • local-first comfort
Cons
  • Quality gap on hard tasks
  • reasoning pause
  • small context
Right for: Cost-conscious devs on scoped/private work
Avoid if: You want one tool that handles everything hard
Skeptic6/10
One self-reported SWE-bench number on xAI's own harness, no other benchmarks, v0.1 — the local-first story is real; the quality story is unproven.

Adversarially, Grok Build 0.1's evidence base is thin even by xAI standards: a single headline benchmark (70.8% SWE-bench Verified) measured on xAI's own internal harness rather than an independent run, and aggregators showing zero sourced evals. The 70.8% also trails Claude Sonnet, so the one number on offer is not a winning one. v0.1 versioning is honest signaling that this is early. The local-first claim is the strongest part of the pitch and appears genuine — code staying off-server is a real, checkable property. But "credible Claude Code competitor" rests largely on positioning, not on independently verified coding quality, and the November 2024 cutoff is a real handicap for current frameworks.

Claim Accuracy 6Weakness Severity 6Hype vs Reality 6
Pros
  • Local-first claim is real and checkable
Cons
  • Single self-reported benchmark
  • v0.1
  • trailing SWE-bench
  • old cutoff
Right for: Buyers who pilot before committing
Avoid if: You need independently verified coding quality

Strengths

  • Local-first execution — source code never leaves the developer's machine (strong for regulated/IP-sensitive teams).
  • Plan-mode approval gate before file mutation — real safety control for autonomous runs.
  • Up to 8 parallel sub-agents in Git worktrees — fast multi-file refactors.
  • Native MCP support — connects to the growing MCP tool ecosystem.
  • API at $1.00 / $2.00 — materially cheaper than Claude Sonnet 4.6 ($3/$15) and ~10x cheaper than Opus 4.7 per token.

Limitations

  • SWE-bench Verified 70.8% trails Claude Sonnet (72.7%) and is far behind Opus 4.7 (87.6%) — not a top-tier coding ceiling.
  • 256K context is the smallest in the Grok lineup; monorepos require disciplined context management.
  • v0.1 maturity — fewer editor integrations, plugins, and tooling than Claude Code / Codex CLI.
  • Benchmark transparency outside SWE-bench is essentially nil; aggregators show no sourced evals yet.
  • Older training cutoff (November 2024) — recent library APIs and language features may be missing.
  • Always-on reasoning slows interactive responses.

Best use cases

- **Regulated / IP-sensitive teams** that legally cannot ship source code to a cloud LLM — the local-first design is genuinely differentiating. - **Cost-sensitive coding-agent workloads** where price-per-task beats ceiling quality. - **Parallel multi-file refactors** where up to 8 sub-agents in Git worktrees speed changes. - **Plan-mode workflows** where developers want explicit approval before any file mutation. - **xAI / X ecosystem teams** preferring a single vendor for chat and code.

Buyer questions

What does it cost?

$1.00 in / $2.00 out per 1M tokens (cached $0.20) on the API; the CLI is bundled into SuperGrok ($30/mo) or X Premium+ ($40/mo), with heavy tiers around $300/mo. Some trackers show $0.20/$1.50 — trust docs.x.ai.

Does my code leave my machine?

No. Grok Build is local-first: source code is not transmitted to xAI's servers. That is its main differentiator and the basis for treating it as privacy-forward.

How does it compare to Claude Code on quality?

Lower. SWE-bench Verified 70.8% trails Claude Sonnet 4.6 (72.7%) and is far below Opus 4.7 (87.6%). Strong for scoped tasks, not for top-tier ceilings.

What is plan mode?

The agent proposes a step list and waits for your approval before mutating files — a safety gate against runaway edits.

Can it handle a large monorepo?

With effort. 256K context is the smallest in the Grok lineup, so monorepos need disciplined context selection; a 1M-context tool is easier there.

What integrations exist?

Native MCP support, the x.ai API (OpenAI/Anthropic-SDK compatible), OpenRouter, and third-party surfaces like Kilo Code. Editor-plugin coverage is thinner than Claude Code at v0.1.

What happened to grok-code-fast-1?

It was retired 2026-05-15 and now redirects to Grok Build 0.1; the 70.8% SWE-bench figure traces to that lineage.

Comparable models

**Claude Sonnet 4.6 / Opus 4.7 via Claude Code** — Higher SWE-bench ceiling (72.7% / 87.6%) and a more mature CLI/integrations; cloud-only (the gap Grok Build exploits) and pricier.
**OpenAI GPT-5.5 / Codex CLI** — Broad IDE integration and disclosure; cloud-only; pricier.
**Aider with any model** — Open-source, bring-your-own-model CLI; no local-first model guarantee and no bundled agent.

Model specs

Input price
$1 / Mtok
Output price
$2 / Mtok
Cached input
$0.20 / Mtok
Batch (in/out)
Context window
256K tokens
Max output
256K tokens
Knowledge cutoff
2024-11
Released
2026-05-19
Modalities
text, image → text
Output speed
Not profiled
License
Proprietary
Clouds
First-party API

Does not train on API inputs by default

Last verified 2026-05-27