Devstral 2

GALatest Coder

by Mistral AI · Devstral family · best for budget open-weight agentic coding

CodingOpen-WeightsCost-OptimizedLong-Context
7.4
AI Panel Score
Value 9.0/10

Devstral 2 (model ID devstral-2-2512, shipped 9 December 2025) is Mistral's open-weight agentic-coding model, launched alongside the Mistral Vibe CLI. It is a 125B-parameter dense transformer with 256K context, built for multi-file edits, repo planning, PR generation, and test running, scoring 72.2% on SWE-bench Verified. Priced at $0.40/$0.90 — meaningfully cheaper on output than Medium 3.5 ($7.50). The license is modified-MIT on the 123B (open with a large-revenue carve-out), and a companion Devstral Small 2 (24B) is clean Apache 2.0. The buyer's sentence: the budget open-weight agentic coder in Mistral's lineup, still in the 70%-SWE-bench tier after Medium 3.5 took the crown. - Provider: Mistral AI (Paris, France) - Release: 2025-12-09, status GA - Context: 256,000 tokens; max output 32,768 - Modalities: text only (no vision) - Knowledge cutoff: ~September 2025 - Headline price: $0.40 input / $0.90 output per 1M tokens - Architecture: dense 125B; companion Devstral Small 2 = 24B (Apache 2.0)

What's new

  • Mistral's frontier agentic-coding model at launch, released alongside the open-source Mistral Vibe CLI.
  • 125B dense with 256K context; companion Devstral Small 2 (24B, Apache 2.0) runs locally.
  • Positioned as ~7x more cost-efficient than Claude Sonnet on real-world coding tasks (Mistral's claim).
  • SWE-bench Verified 72.2%, SWE-bench Multilingual 61.3%, Terminal-Bench 2 32.6%.
  • Output price corrected to $0.90/1M (an earlier draft listed $2.00 — the verified La Plateforme rate is $0.40/$0.90).
  • Subsequently surpassed by Medium 3.5 (April 2026) on SWE-bench Verified (77.6% vs 72.2%); Mistral migrated the Vibe CLI default to Medium 3.5, but Devstral 2 remains GA and cheaper on output.

Benchmarks

BenchmarkScoreSource
Terminal-Bench32.6%huggingface.co 2025-12-09T00:00:00.000Z
SWE-bench Verified72.2%huggingface.co 2025-12-09T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker7/10
At launch it was our open-weight agentic coder; now it's the budget option under Medium 3.5 — cheaper output keeps it on the roster.

In December 2025 Devstral 2 was strategically significant: a credible open-weight agentic coder with a first-party CLI. By mid-2026 Mistral's own Medium 3.5 beat it on the headline benchmark and took over the Vibe CLI default. Devstral 2 stays relevant because it is materially cheaper on output ($0.90 vs $7.50), which matters for high-volume batch coding. For new architecture decisions I would default to Medium 3.5 unless output cost forces otherwise. The modified-MIT license on the 125B carries the same enterprise carve-out as Medium 3.5; for clean licensing I'd reach for Devstral Small 2.

Strategic Fit 7Vendor Risk 7Roadmap Confidence 6
Pros
  • cheap output, open weights, first-party CLI
Cons
  • superseded
  • modified-MIT carve-out
  • update cadence unclear
Right for: budget agentic coding
Avoid if: you want the current best or a clean license at 125B
Domain Strategist7/10
Mistral's open agentic-coder play — now repositioned as the value tier beneath its own merged flagship.

Devstral 2 established Mistral's open agentic-coding credibility and seeded the Vibe CLI ecosystem. Strategically it has been repositioned from "frontier coder" to "value coder" beneath Medium 3.5, which is a coherent ladder but blunts its standalone narrative. Against Qwen 3 Coder and DeepSeek Coder it competes on tool-use quality and the Vibe CLI; against closed coders it competes on price and openness. The durable asset is the open-weight + EU-residency + cheap-output combination for budget agentic workloads. The 24B Apache-2.0 sibling broadens the strategic surface to laptops and unrestricted fine-tuning.

Competitive Positioning 7Differentiation 7Market Timing 6
Pros
  • open agentic-coder slot, Vibe ecosystem, cheap output
Cons
  • outshone by Medium 3.5
  • carve-out license
Right for: value-tier coding products
Avoid if: you want the flagship story
Finance Lead8.5/10
$0.90 output vs Medium 3.5's $7.50 for a model within ~5pp on SWE-bench — for high-volume batch coding, that delta is the whole decision.

The financial case is sharp. Devstral 2's $0.40/$0.90 is 3.75x cheaper on output than Medium 3.5 for a model within ~5pp on SWE-bench. For high-volume batch agentic workloads — automated refactoring, test generation across thousands of files — the cost delta dominates and Devstral 2 is the rational choice when peak quality isn't required. Devstral Small 2 self-hosted on a laptop has effectively zero marginal cost. The caveat is the modified-MIT carve-out at the 125B tier for large enterprises; the clean-license, fixed-cost route is Devstral Small 2. Excellent unit economics for budget-sensitive coding.

Cost Efficiency 9Pricing Transparency 8Value per Dollar 9
Pros
  • 3.75x cheaper output than Medium 3.5
  • near-zero-cost 24B self-host
Cons
  • 125B license carve-out
Right for: high-volume batch coding
Avoid if: enterprise scale where the license fee applies
Domain Practitioner7.5/10
The SWE-bench numbers translate to real productivity via Vibe CLI — multi-file edits and sensible PR scaffolding — and Devstral Small 2 on a laptop is genuinely useful.

Through Q1 2026 Devstral 2 was my agentic-coding workhorse via Vibe CLI: 72.2% SWE-bench translates to real multi-file edits, sensible PR scaffolding, and decent test generation, and 256K context handles realistic repos. Tool-calling (git/terminal/Python) is reliable. After Medium 3.5 I migrated most new work, but Devstral 2 still runs my batch refactoring jobs because output tokens are far cheaper. Devstral Small 2 (Apache 2.0) on a laptop is a genuinely useful local agent. The lack of vision is the main felt gap when I want to paste a stack-trace screenshot.

API Ergonomics 8Tool/Agent Support 8Reliability 7
Pros
  • real productivity, reliable tools, laptop-viable 24B
Cons
  • no vision
  • superseded on quality
Right for: agentic coding on a budget
Avoid if: you need vision or peak SWE-bench
Power User7/10
I see Devstral 2's output — PRs, refactors, generated tests — more than the model itself, and the results are useful more often than not.

End users meet Devstral 2 through the artifacts its agents produce: PRs, refactors, generated tests via Vibe CLI. Indirectly the experience is good — useful PRs more often than not, with reasonable latency for agent loops. There is no chat or vision dimension to evaluate; it is a coding engine, not a conversational partner. The absence of vision is a felt limitation when a stack-trace screenshot would help. For developers living in an agentic CLI, a dependable engine; for anyone expecting a chatbot, the wrong tool.

Output Quality 7Speed 7Everyday Usefulness 7
Pros
  • useful agent output, reasonable loop latency
Cons
  • no vision
  • not conversational
Right for: CLI-driven coding
Avoid if: you want chat or image input
Skeptic7/10
A solid coder whose own maker beat it four months later — and the '7x cheaper than Sonnet' line needs the SWE-bench gap printed next to it.

Devstral 2 is genuinely capable, so the skepticism is about positioning and license. The "7x more cost-efficient than Claude Sonnet" claim is true on price but elides the ~5-7pp SWE-bench gap — it's cheaper and slightly weaker, not cheaper and equal. The "open weights" framing again carries an asterisk: the 125B is modified-MIT with a large-revenue carve-out, only the 24B is clean Apache 2.0. And Mistral itself superseded it with Medium 3.5 within four months, migrating the CLI default away. The honest claim is "good budget open-weight agentic coder with a license caveat at the large size." Buy the 24B for clean self-host, the 125B for cheap output if you read the license.

Claim Accuracy 7Weakness Severity 6Hype vs Reality 7
Pros
  • real coding, real price advantage
Cons
  • cost claim elides quality gap
  • 125B license asterisk
  • quickly superseded
Right for: budget coders who check the license
Avoid if: you assumed parity-with-Sonnet or clean Apache 2.0 at 125B

Strengths

  • Frontier-adjacent agentic coding at $0.40/$0.90 — 3.75x cheaper on output than Medium 3.5.
  • The ~7x cost-efficiency-vs-Sonnet claim holds for many real coding workloads.
  • 256K context fits real repos; strong tool-calling and multi-file editing.
  • Open-weight (modified-MIT) at 125B is rare for an agentic coder.
  • Devstral Small 2 (Apache 2.0, 24B) gives a clean, laptop-viable self-host baseline.

Limitations

  • Surpassed by Medium 3.5 on SWE-bench Verified (77.6% vs 72.2%); Mistral migrated Vibe CLI's default to Medium 3.5.
  • 125B license is modified-MIT, NOT Apache 2.0 — large-revenue enterprises need a commercial deal (the clean-license option is Devstral Small 2).
  • No vision — no screenshot-to-code or stack-trace-image workflows.
  • Trails Claude Sonnet 4.5 on the hardest multi-step engineering benchmarks.
  • Strategic uncertainty: with Medium 3.5 absorbing the role, the cadence of future Devstral updates is unclear.

Best use cases

- Coding agents and CLIs (especially Vibe CLI) where output cost matters. - Self-hosted in-product code agents where the modified-MIT terms are acceptable (or Devstral Small 2 for clean Apache 2.0). - Long-context repo refactoring. - Budget-constrained agentic coding where Medium 3.5's $7.50 output is too high. - Devstral Small 2 specifically: local laptop-based agent loops.

Buyer questions

Is the output price $0.90 or $2.00?

$0.90 — the verified La Plateforme rate is $0.40 input / $0.90 output (an earlier figure of $2.00 was incorrect).

Is it open weights?

The 125B is modified-MIT (open with a large-revenue carve-out, same as Medium 3.5). For clean Apache 2.0, use Devstral Small 2 (24B).

How good is it at coding?

SWE-bench Verified 72.2% and Multilingual 61.3% — competitive agentic coding, ~5pp behind Medium 3.5 and Claude Sonnet 4.5.

Should I use it or Medium 3.5?

Use Devstral 2 when output cost matters (8.3x cheaper output) and peak quality isn't required; use Medium 3.5 for the best SWE-bench and a reasoning knob.

Can it run locally?

The 125B needs ~80GB+ VRAM; Devstral Small 2 (24B, Apache 2.0) runs on a laptop.

Does it do vision?

No — text-only. No screenshot-to-code.

What is Vibe CLI?

An open-source terminal coding agent; it now defaults to Medium 3.5 but is configurable back to Devstral 2.

Comparable models

**Mistral Medium 3.5:** Its successor in the role — +5.4pp SWE-bench but 8.3x the output price ($7.50 vs $0.90), same modified-MIT license; the quality-vs-cost choice inside Mistral.
**Claude Sonnet 4.5:** +5-7pp SWE-bench, several times the price, closed weights — the premium agentic coder.
**Devstral Small 2 (Mistral):** The clean Apache-2.0 24B sibling for laptop/local agents and unrestricted fine-tuning.
**Qwen 3 Coder:** Competitive open-weight coder; weaker tool-use and no first-party CLI.

Model specs

Input price
$0.40 / Mtok
Output price
$0.90 / Mtok
Cached input
$0.04 / Mtok
Batch (in/out)
$0.20 / $0.45
Context window
256K tokens
Max output
33K tokens
Knowledge cutoff
2025-09
Released
2025-12-08
Modalities
text → text
Output speed
Not profiled
License
Open weights (custom-modified-mit)
Clouds
Bedrock, Azure AI Foundry

Does not train on API inputs by default

Last verified 2026-05-27