by Mistral AI · Mistral Medium family · best for open-weight agentic coding at sub-frontier price
Mistral Medium 3.5 (released 29 April 2026) is Mistral's most strategically important model of the year: a single 128B dense set of weights that merges what used to be three separate models — chat (Medium 3.1), reasoning (Magistral), and the agentic coder (Devstral 2) — into one SKU with a configurable reasoning effort knob. It scores 77.6% on SWE-bench Verified, within two points of Claude Sonnet 4.5, at $1.50/$7.50 per 1M tokens, and ships under a modified-MIT license that is open for nearly everyone except large-revenue enterprises. The buyer's sentence: the best open-weight agentic coding model at the Medium tier, if you can live with a license carve-out and a high output price. - Provider: Mistral AI (Paris, France) - Release: 2026-04-29, status GA - Context: 256,000 tokens; max output 32,768 - Modalities: text + image in, text out (native multimodal) - Knowledge cutoff: ~February 2026 - Headline price: $1.50 input / $7.50 output per 1M tokens - Architecture: dense 128B with `reasoning_effort` toggle (none/high)
| Benchmark | Score | Source |
|---|---|---|
| TAU-bench | 91.4% | huggingface.co 2026-04-29T00:00:00.000Z |
| SWE-bench Verified | 77.6% | huggingface.co 2026-04-29T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“One open-weight endpoint that retires my chat, reasoning, and coding models at once — but I had to read the license twice before I trusted it.”
Medium 3.5 is the most consequential consolidation play of 2026: a single SKU lands within two points of Claude Sonnet 4.5 on SWE-bench at under half the cost and lets me retire a multi-model routing layer. For an EU-sovereign agent platform it is the new default. The catch is governance, not capability: the modified-MIT license carves out large-revenue companies, so an enterprise buyer must verify whether they fall above the threshold and budget for a commercial deal if so. Mistral also withheld the hard reasoning benchmarks, signalling deliberate optimisation for coding/agentic over pure analysis. For coding agents, ideal; for analytical workloads, benchmark against Magistral first.
“Mistral's bet is that 'merged model + open weights + EU residency' beats a zoo of specialist endpoints — and on developer mindshare it's working.”
The merged-model strategy is a sharp competitive move: it reframes Mistral's lineup from a confusing family of specialists into one coherent flagship, which is easier to sell and adopt. Positioned against Claude Sonnet and GPT-5 mini, the wedge is open weights plus EU residency at near-parity coding capability. The modified-MIT license is a calculated middle path — open enough to win indie and mid-market mindshare, monetised at the enterprise tier where Mistral needs revenue. The risk is messaging: "open weights" with a revenue asterisk invites the same criticism Meta's Llama license drew. Market timing is strong, riding the agentic-coding wave and the EU AI Act compliance tailwind.
“Coding capability at a third of Sonnet's price — but the $7.50 output rate and a possible enterprise license fee both need to be in the model.”
The API headline is strong: $1.50/$7.50 puts near-Sonnet agentic coding at roughly a third of Claude's cost, and for a team running thousands of agent calls daily the monthly delta is tens of thousands of dollars at scale. Two cautions temper it. First, the $7.50 output rate is high for the Medium tier and agentic loops are output-heavy, so cache and structured-output discipline matter. Second, the modified-MIT license may carry a separate commercial fee for large enterprises — a line item that does not exist for Apache models like Large 3 or Small 4. Self-host on ~4 GPUs converts opex to capex for steady-state teams below the threshold. Strong unit economics with two asterisks.
“The first Mistral that genuinely competes with Sonnet for agentic coding — and Vibe CLI is a real Claude Code rival with first-party integration.”
For a builder this is the standout. SWE-bench 77.6% is real-world useful: it plans multi-file edits and opens credible PRs. Vibe CLI is a legitimate Cursor/Claude Code competitor with native integration. The `reasoning_effort` dial means I don't fire a separate Magistral call when I want deeper thinking — one model, one schema. Function calling and JSON output are the best in the family, and open weights give a self-host escape hatch. Negatives are familiar: docs are thinner than Anthropic's, function-calling occasionally drifts on complex schemas, and 256K context can spike latency. Best price-to-capability ratio in the open-weight coding tier.
“I rarely talk to it directly — it lives behind my IDE and agents — but with high effort it's noticeably more careful.”
End users mostly meet Medium 3.5 through IDEs and agents rather than chat. In Le Chat with `reasoning_effort=high`, responses are slower than the chat default but markedly more careful on multi-step questions; at `none` it is responsive. European-language quality remains excellent and refusals are moderate. It feels engineered for tasks, not conversation — less warm and less expressive than Claude or GPT-5 for casual use. As a daily driver it is a strong work tool and a merely-adequate chat companion.
“'Open-weight frontier coder' — except the license has a revenue cliff and the hard reasoning benchmarks went unpublished. Two asterisks on one launch.”
The coding numbers are credible and Vibe CLI is genuinely good, so this isn't a hollow launch. But two claims deserve scrutiny. First, "open weights": the modified-MIT license carves out large-revenue companies, so the unqualified "open" framing misleads exactly the enterprise buyers who most need to know. Second, "frontier": Mistral published SWE-bench and tau3-Telecom — where it leads — and withheld GPQA Diamond, MMLU-Pro, and LiveCodeBench, the reasoning benchmarks where it would be measured against Claude and GPT-5. The honest claim is "best open-weight agentic coder at this price, for non-enterprise users." Buy it for coding economics, read the license, and don't assume reasoning parity with US frontier models.
- Agentic coding pipelines: PR generation, repo refactoring, multi-file edits via Vibe CLI. - Replacing a multi-model stack (chat + reasoning + coding) with one endpoint. - EU enterprises (below the revenue threshold) needing sovereign, self-hostable agent infrastructure. - Long-context code review across 256K tokens of repo context. - Tool-heavy agent workflows where tau-style benchmarks predict real-world performance.
Yes, but under a modified-MIT license, not Apache 2.0. It is open for research and commercial use by most parties; companies above a revenue threshold must negotiate a separate commercial agreement with Mistral. Verify your revenue against the threshold.
SWE-bench Verified 77.6%, within ~2pp of Claude Sonnet 4.5, and tau3-Telecom 91.4% for agentic tool use. Strong, real-world useful.
$7.50/1M output is high for the Medium tier; agentic loops are output-heavy, so use caching, structured output, and batch where possible.
Dense 128B runs on roughly 4 GPUs quantised (~80GB+ VRAM at FP8/NVFP4).
Usually not — `reasoning_effort=high` covers reasoning and Medium 3.5 beats Devstral 2 on SWE-bench. Devstral 2 remains relevant only as a cheaper output-token option.
EU by default on La Plateforme; 30-day abuse retention, no training on inputs unless opt-in, ZDR available.
A first-party open-source terminal coding agent (Cursor/Claude Code style) that now defaults to Medium 3.5 and can open PRs.
Does not train on API inputs by default
Last verified 2026-05-27