Mistral Medium 3.5

GALatest Medium

by Mistral AI · Mistral Medium family · best for open-weight agentic coding at sub-frontier price

FrontierCodingOpen-WeightsMultimodalLong-Context
8.4
AI Panel Score
Value 8.5/10

Mistral Medium 3.5 (released 29 April 2026) is Mistral's most strategically important model of the year: a single 128B dense set of weights that merges what used to be three separate models — chat (Medium 3.1), reasoning (Magistral), and the agentic coder (Devstral 2) — into one SKU with a configurable reasoning effort knob. It scores 77.6% on SWE-bench Verified, within two points of Claude Sonnet 4.5, at $1.50/$7.50 per 1M tokens, and ships under a modified-MIT license that is open for nearly everyone except large-revenue enterprises. The buyer's sentence: the best open-weight agentic coding model at the Medium tier, if you can live with a license carve-out and a high output price. - Provider: Mistral AI (Paris, France) - Release: 2026-04-29, status GA - Context: 256,000 tokens; max output 32,768 - Modalities: text + image in, text out (native multimodal) - Knowledge cutoff: ~February 2026 - Headline price: $1.50 input / $7.50 output per 1M tokens - Architecture: dense 128B with `reasoning_effort` toggle (none/high)

What's new

  • First Mistral "merged" model: one set of weights replaces Medium 3.1 (chat), Magistral (reasoning), and Devstral 2 (coding agent). This is the architectural story.
  • `reasoning_effort` parameter selects between fast chat (`none`) and extended thinking (`high`) per request — one endpoint, two behaviours.
  • Open weights under a modified-MIT license — a shift from Medium 3.1, which was Premier-only (closed). NOTE: the modified-MIT carve-out requires large-revenue companies to negotiate a separate commercial arrangement; it is not a clean Apache 2.0.
  • 128B dense (not MoE), self-hostable on roughly 4 GPUs in quantised form — far lighter than Large 3's 675B MoE.
  • SWE-bench Verified 77.6% (up from Devstral 2's 72.2%) and tau3-Telecom 91.4%.
  • Ships paired with the Mistral Vibe CLI, a terminal-native coding agent that can open PRs; Mistral migrated Vibe from Devstral 2 to Medium 3.5.

Benchmarks

BenchmarkScoreSource
TAU-bench91.4%huggingface.co 2026-04-29T00:00:00.000Z
SWE-bench Verified77.6%huggingface.co 2026-04-29T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker8.5/10
One open-weight endpoint that retires my chat, reasoning, and coding models at once — but I had to read the license twice before I trusted it.

Medium 3.5 is the most consequential consolidation play of 2026: a single SKU lands within two points of Claude Sonnet 4.5 on SWE-bench at under half the cost and lets me retire a multi-model routing layer. For an EU-sovereign agent platform it is the new default. The catch is governance, not capability: the modified-MIT license carves out large-revenue companies, so an enterprise buyer must verify whether they fall above the threshold and budget for a commercial deal if so. Mistral also withheld the hard reasoning benchmarks, signalling deliberate optimisation for coding/agentic over pure analysis. For coding agents, ideal; for analytical workloads, benchmark against Magistral first.

Strategic Fit 9Vendor Risk 8Roadmap Confidence 8
Pros
  • SKU consolidation, near-Sonnet coding, EU residency
Cons
  • modified-MIT carve-out
  • withheld reasoning benchmarks
Right for: EU agent platforms below the revenue threshold
Avoid if: you assumed Apache 2.0 and you're a large enterprise
Domain Strategist8.5/10
Mistral's bet is that 'merged model + open weights + EU residency' beats a zoo of specialist endpoints — and on developer mindshare it's working.

The merged-model strategy is a sharp competitive move: it reframes Mistral's lineup from a confusing family of specialists into one coherent flagship, which is easier to sell and adopt. Positioned against Claude Sonnet and GPT-5 mini, the wedge is open weights plus EU residency at near-parity coding capability. The modified-MIT license is a calculated middle path — open enough to win indie and mid-market mindshare, monetised at the enterprise tier where Mistral needs revenue. The risk is messaging: "open weights" with a revenue asterisk invites the same criticism Meta's Llama license drew. Market timing is strong, riding the agentic-coding wave and the EU AI Act compliance tailwind.

Competitive Positioning 9Differentiation 8Market Timing 9
Pros
  • coherent flagship narrative
  • agentic-coding timing
Cons
  • "open-with-asterisk" messaging risk
Right for: products selling EU-sovereign coding
Avoid if: your buyers demand truly unrestricted licensing
Finance Lead8.5/10
Coding capability at a third of Sonnet's price — but the $7.50 output rate and a possible enterprise license fee both need to be in the model.

The API headline is strong: $1.50/$7.50 puts near-Sonnet agentic coding at roughly a third of Claude's cost, and for a team running thousands of agent calls daily the monthly delta is tens of thousands of dollars at scale. Two cautions temper it. First, the $7.50 output rate is high for the Medium tier and agentic loops are output-heavy, so cache and structured-output discipline matter. Second, the modified-MIT license may carry a separate commercial fee for large enterprises — a line item that does not exist for Apache models like Large 3 or Small 4. Self-host on ~4 GPUs converts opex to capex for steady-state teams below the threshold. Strong unit economics with two asterisks.

Cost Efficiency 8Pricing Transparency 8Value per Dollar 9
Pros
  • a third of Sonnet's price
  • ~4-GPU self-host
Cons
  • high output rate
  • possible enterprise license fee
Right for: high-volume coding below the revenue threshold
Avoid if: enterprise-scale where the license fee erodes the saving
Domain Practitioner9/10
The first Mistral that genuinely competes with Sonnet for agentic coding — and Vibe CLI is a real Claude Code rival with first-party integration.

For a builder this is the standout. SWE-bench 77.6% is real-world useful: it plans multi-file edits and opens credible PRs. Vibe CLI is a legitimate Cursor/Claude Code competitor with native integration. The `reasoning_effort` dial means I don't fire a separate Magistral call when I want deeper thinking — one model, one schema. Function calling and JSON output are the best in the family, and open weights give a self-host escape hatch. Negatives are familiar: docs are thinner than Anthropic's, function-calling occasionally drifts on complex schemas, and 256K context can spike latency. Best price-to-capability ratio in the open-weight coding tier.

API Ergonomics 9Tool/Agent Support 9Reliability 8
Pros
  • near-Sonnet coding, Vibe CLI, one-SKU reasoning
Cons
  • thinner docs, occasional FC drift
Right for: agentic coding builders
Avoid if: you need the deepest tooling ecosystem and SLAs
Power User7.5/10
I rarely talk to it directly — it lives behind my IDE and agents — but with high effort it's noticeably more careful.

End users mostly meet Medium 3.5 through IDEs and agents rather than chat. In Le Chat with `reasoning_effort=high`, responses are slower than the chat default but markedly more careful on multi-step questions; at `none` it is responsive. European-language quality remains excellent and refusals are moderate. It feels engineered for tasks, not conversation — less warm and less expressive than Claude or GPT-5 for casual use. As a daily driver it is a strong work tool and a merely-adequate chat companion.

Output Quality 7.5Speed 7Everyday Usefulness 8
Pros
  • careful with high effort, strong EU languages
Cons
  • task-engineered, not warm
  • high-effort latency
Right for: developer daily use
Avoid if: you want a friendly general chat partner
Skeptic7/10
'Open-weight frontier coder' — except the license has a revenue cliff and the hard reasoning benchmarks went unpublished. Two asterisks on one launch.

The coding numbers are credible and Vibe CLI is genuinely good, so this isn't a hollow launch. But two claims deserve scrutiny. First, "open weights": the modified-MIT license carves out large-revenue companies, so the unqualified "open" framing misleads exactly the enterprise buyers who most need to know. Second, "frontier": Mistral published SWE-bench and tau3-Telecom — where it leads — and withheld GPQA Diamond, MMLU-Pro, and LiveCodeBench, the reasoning benchmarks where it would be measured against Claude and GPT-5. The honest claim is "best open-weight agentic coder at this price, for non-enterprise users." Buy it for coding economics, read the license, and don't assume reasoning parity with US frontier models.

Claim Accuracy 6Weakness Severity 6Hype vs Reality 7
Pros
  • real coding capability, real CLI
Cons
  • license asterisk
  • selective benchmarks
Right for: non-enterprise coding teams who read the fine print
Avoid if: you took "open frontier" at face value

Strengths

  • One of very few open-weight models above 77% SWE-bench Verified at sub-$10 output pricing.
  • Single SKU replaces a three-model stack (chat + reasoning + coding) — major architecture simplification.
  • `reasoning_effort` dial removes the need for a separate reasoning model call.
  • 256K context handles full-repo agentic tasks; best-in-class function calling and JSON.
  • Self-hostable on ~4 GPUs — far lighter than Large 3.
  • Ships with first-party Vibe CLI.

Limitations

  • License is modified-MIT, NOT Apache 2.0 — large-revenue enterprises must pay for a commercial arrangement. This is a real procurement gotcha.
  • Output price of $7.50 is high for the Medium tier — nearly 4x Medium 3.1's $2.00.
  • Mistral withheld GPQA Diamond, MMLU-Pro, and LiveCodeBench at launch — hard to benchmark on pure reasoning.
  • 128B dense raises the self-host memory floor versus a sparse model of similar nominal size.
  • Still trails Claude Sonnet 4.5 and GPT-5 on the hardest reasoning evals.
  • Younger in production than Large 3; less third-party tooling so far.

Best use cases

- Agentic coding pipelines: PR generation, repo refactoring, multi-file edits via Vibe CLI. - Replacing a multi-model stack (chat + reasoning + coding) with one endpoint. - EU enterprises (below the revenue threshold) needing sovereign, self-hostable agent infrastructure. - Long-context code review across 256K tokens of repo context. - Tool-heavy agent workflows where tau-style benchmarks predict real-world performance.

Buyer questions

Is it actually open weights?

Yes, but under a modified-MIT license, not Apache 2.0. It is open for research and commercial use by most parties; companies above a revenue threshold must negotiate a separate commercial agreement with Mistral. Verify your revenue against the threshold.

How good is it at coding?

SWE-bench Verified 77.6%, within ~2pp of Claude Sonnet 4.5, and tau3-Telecom 91.4% for agentic tool use. Strong, real-world useful.

Why is output so expensive?

$7.50/1M output is high for the Medium tier; agentic loops are output-heavy, so use caching, structured output, and batch where possible.

What's the self-host footprint?

Dense 128B runs on roughly 4 GPUs quantised (~80GB+ VRAM at FP8/NVFP4).

Do I still need Magistral or Devstral 2?

Usually not — `reasoning_effort=high` covers reasoning and Medium 3.5 beats Devstral 2 on SWE-bench. Devstral 2 remains relevant only as a cheaper output-token option.

Where does my data live?

EU by default on La Plateforme; 30-day abuse retention, no training on inputs unless opt-in, ZDR available.

What is Vibe CLI?

A first-party open-source terminal coding agent (Cursor/Claude Code style) that now defaults to Medium 3.5 and can open PRs.

Comparable models

**Claude Sonnet 4.5:** Leads SWE-bench by ~2pp, costs ~2x, closed weights — the capability-vs-price-and-openness trade.
**GPT-5 mini:** Similar price tier, weaker SWE-bench, no open weights, broader ecosystem.
**Devstral 2 (Mistral):** Its predecessor in the coding role; 5.4pp lower SWE-bench but 3.75x cheaper on output ($0.90 vs $7.50) — the budget alternative inside Mistral's own lineup.
**Mistral Large 3:** Broader generalist with a cleaner Apache 2.0 license, but weaker on coding/agentic specifically.

Model specs

Input price
$1.50 / Mtok
Output price
$7.50 / Mtok
Cached input
Batch (in/out)
$0.75 / $3.75
Context window
256K tokens
Max output
33K tokens
Knowledge cutoff
2026-02
Released
2026-04-28
Modalities
text, image → text
Output speed
Not profiled
License
Open weights (custom-modified-mit)
Clouds
Bedrock, Azure AI Foundry

Does not train on API inputs by default

Other Mistral Medium versions

Last verified 2026-05-27