by Mistral AI · Devstral family · best for budget open-weight agentic coding
Devstral 2 (model ID devstral-2-2512, shipped 9 December 2025) is Mistral's open-weight agentic-coding model, launched alongside the Mistral Vibe CLI. It is a 125B-parameter dense transformer with 256K context, built for multi-file edits, repo planning, PR generation, and test running, scoring 72.2% on SWE-bench Verified. Priced at $0.40/$0.90 — meaningfully cheaper on output than Medium 3.5 ($7.50). The license is modified-MIT on the 123B (open with a large-revenue carve-out), and a companion Devstral Small 2 (24B) is clean Apache 2.0. The buyer's sentence: the budget open-weight agentic coder in Mistral's lineup, still in the 70%-SWE-bench tier after Medium 3.5 took the crown. - Provider: Mistral AI (Paris, France) - Release: 2025-12-09, status GA - Context: 256,000 tokens; max output 32,768 - Modalities: text only (no vision) - Knowledge cutoff: ~September 2025 - Headline price: $0.40 input / $0.90 output per 1M tokens - Architecture: dense 125B; companion Devstral Small 2 = 24B (Apache 2.0)
| Benchmark | Score | Source |
|---|---|---|
| Terminal-Bench | 32.6% | huggingface.co 2025-12-09T00:00:00.000Z |
| SWE-bench Verified | 72.2% | huggingface.co 2025-12-09T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“At launch it was our open-weight agentic coder; now it's the budget option under Medium 3.5 — cheaper output keeps it on the roster.”
In December 2025 Devstral 2 was strategically significant: a credible open-weight agentic coder with a first-party CLI. By mid-2026 Mistral's own Medium 3.5 beat it on the headline benchmark and took over the Vibe CLI default. Devstral 2 stays relevant because it is materially cheaper on output ($0.90 vs $7.50), which matters for high-volume batch coding. For new architecture decisions I would default to Medium 3.5 unless output cost forces otherwise. The modified-MIT license on the 125B carries the same enterprise carve-out as Medium 3.5; for clean licensing I'd reach for Devstral Small 2.
“Mistral's open agentic-coder play — now repositioned as the value tier beneath its own merged flagship.”
Devstral 2 established Mistral's open agentic-coding credibility and seeded the Vibe CLI ecosystem. Strategically it has been repositioned from "frontier coder" to "value coder" beneath Medium 3.5, which is a coherent ladder but blunts its standalone narrative. Against Qwen 3 Coder and DeepSeek Coder it competes on tool-use quality and the Vibe CLI; against closed coders it competes on price and openness. The durable asset is the open-weight + EU-residency + cheap-output combination for budget agentic workloads. The 24B Apache-2.0 sibling broadens the strategic surface to laptops and unrestricted fine-tuning.
“$0.90 output vs Medium 3.5's $7.50 for a model within ~5pp on SWE-bench — for high-volume batch coding, that delta is the whole decision.”
The financial case is sharp. Devstral 2's $0.40/$0.90 is 3.75x cheaper on output than Medium 3.5 for a model within ~5pp on SWE-bench. For high-volume batch agentic workloads — automated refactoring, test generation across thousands of files — the cost delta dominates and Devstral 2 is the rational choice when peak quality isn't required. Devstral Small 2 self-hosted on a laptop has effectively zero marginal cost. The caveat is the modified-MIT carve-out at the 125B tier for large enterprises; the clean-license, fixed-cost route is Devstral Small 2. Excellent unit economics for budget-sensitive coding.
“The SWE-bench numbers translate to real productivity via Vibe CLI — multi-file edits and sensible PR scaffolding — and Devstral Small 2 on a laptop is genuinely useful.”
Through Q1 2026 Devstral 2 was my agentic-coding workhorse via Vibe CLI: 72.2% SWE-bench translates to real multi-file edits, sensible PR scaffolding, and decent test generation, and 256K context handles realistic repos. Tool-calling (git/terminal/Python) is reliable. After Medium 3.5 I migrated most new work, but Devstral 2 still runs my batch refactoring jobs because output tokens are far cheaper. Devstral Small 2 (Apache 2.0) on a laptop is a genuinely useful local agent. The lack of vision is the main felt gap when I want to paste a stack-trace screenshot.
“I see Devstral 2's output — PRs, refactors, generated tests — more than the model itself, and the results are useful more often than not.”
End users meet Devstral 2 through the artifacts its agents produce: PRs, refactors, generated tests via Vibe CLI. Indirectly the experience is good — useful PRs more often than not, with reasonable latency for agent loops. There is no chat or vision dimension to evaluate; it is a coding engine, not a conversational partner. The absence of vision is a felt limitation when a stack-trace screenshot would help. For developers living in an agentic CLI, a dependable engine; for anyone expecting a chatbot, the wrong tool.
“A solid coder whose own maker beat it four months later — and the '7x cheaper than Sonnet' line needs the SWE-bench gap printed next to it.”
Devstral 2 is genuinely capable, so the skepticism is about positioning and license. The "7x more cost-efficient than Claude Sonnet" claim is true on price but elides the ~5-7pp SWE-bench gap — it's cheaper and slightly weaker, not cheaper and equal. The "open weights" framing again carries an asterisk: the 125B is modified-MIT with a large-revenue carve-out, only the 24B is clean Apache 2.0. And Mistral itself superseded it with Medium 3.5 within four months, migrating the CLI default away. The honest claim is "good budget open-weight agentic coder with a license caveat at the large size." Buy the 24B for clean self-host, the 125B for cheap output if you read the license.
- Coding agents and CLIs (especially Vibe CLI) where output cost matters. - Self-hosted in-product code agents where the modified-MIT terms are acceptable (or Devstral Small 2 for clean Apache 2.0). - Long-context repo refactoring. - Budget-constrained agentic coding where Medium 3.5's $7.50 output is too high. - Devstral Small 2 specifically: local laptop-based agent loops.
$0.90 — the verified La Plateforme rate is $0.40 input / $0.90 output (an earlier figure of $2.00 was incorrect).
The 125B is modified-MIT (open with a large-revenue carve-out, same as Medium 3.5). For clean Apache 2.0, use Devstral Small 2 (24B).
SWE-bench Verified 72.2% and Multilingual 61.3% — competitive agentic coding, ~5pp behind Medium 3.5 and Claude Sonnet 4.5.
Use Devstral 2 when output cost matters (8.3x cheaper output) and peak quality isn't required; use Medium 3.5 for the best SWE-bench and a reasoning knob.
The 125B needs ~80GB+ VRAM; Devstral Small 2 (24B, Apache 2.0) runs on a laptop.
No — text-only. No screenshot-to-code.
An open-source terminal coding agent; it now defaults to Medium 3.5 but is configurable back to Devstral 2.
Does not train on API inputs by default
Last verified 2026-05-27