Mistral Medium 3.1

GA

by Mistral AI · Mistral Medium family · best for cheap multilingual multimodal chat at volume

Cost-OptimizedMultimodalmultilingual
7.7
AI Panel Score
Value 9.0/10

Mistral Medium 3.1 (release 25.08, shipped 13 August 2025) is the cost-optimised workhorse of Mistral's proprietary "Premier" tier: a closed-weight, multimodal, 128K-context model priced at $0.40/$2.00 per 1M tokens — among the cheapest frontier-adjacent multimodal models on the market. It is an incremental refinement of Medium 3 (May 2025) with better instruction following, sharper tool use, and improved vision. The buyer's sentence: when budget is the dominant constraint and the workload is chat/Q&A rather than agentic coding, this is still the value pick — though Medium 3.5 now supersedes it for coding and agentic work. - Provider: Mistral AI (Paris, France) - Release: 2025-08-13, status GA - Context: 131,072 tokens; max output 16,384 - Modalities: text + image in, text out (native multimodal) - Knowledge cutoff: ~April 2025 - Headline price: $0.40 input / $2.00 output per 1M tokens - Architecture: dense (parameter count undisclosed — Premier/closed)

What's new

  • Incremental update over Medium 3 (May 2025): better instruction following, sharper tool use, improved vision.
  • Retains Medium 3's pricing — one of the cheapest frontier-adjacent multimodal models available.
  • Remains Premier tier (proprietary, closed weights); open weights only arrived in this family with Medium 3.5 in April 2026.
  • No configurable reasoning effort — for extended thinking, callers escalate to Magistral or Medium 3.5.

Benchmarks

BenchmarkScoreSource
HumanEval92%artificialanalysis.ai 2026-05-28T00:00:00.000Z
GPQA Diamond57%artificialanalysis.ai 2026-05-28T00:00:00.000Z
Artificial Analysis Index21artificialanalysis.ai 2026-05-28T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker7.5/10
The model I've actually been deploying since late 2025 — cheap, multilingual, multimodal, stable — even if Medium 3.5 now outclasses it for agents.

Medium 3.1 has been the dependable production workhorse: inexpensive, multilingual, multimodal, and operationally reliable on La Plateforme. The strategic weakness is closed weights — no on-prem, so the sovereignty story is limited to EU-hosted API. With Medium 3.5 now offering open weights and a reasoning knob, I would only choose 3.1 when budget is the binding constraint and the workload is chat/Q&A rather than agentic coding. For mid-sized European SaaS where EU residency matters and budget is real, it remains a sound default for the chat surface.

Strategic Fit 7Vendor Risk 7Roadmap Confidence 8
Pros
  • cheap, stable, multimodal, multilingual
Cons
  • closed weights
  • superseded for agents
Right for: cost-bound chat/Q&A
Avoid if: you need self-host or agentic coding
Domain Strategist7/10
It owns the 'cheap capable multimodal' slot in Europe, but its successor is already eating its strategic mindshare.

Medium 3.1's positioning was sharp at launch: frontier-adjacent multimodal capability at a price that unlocked high-volume European-language use cases US flagships couldn't touch economically. That slot is still valuable, but the strategic narrative has moved to Medium 3.5 (open weights, merged architecture). 3.1 now reads as the mature, boring, cost-tier option rather than the headline. Against GPT-5 mini and Claude Haiku it competes on EU-language quality and price; against its own successor it competes only on cost. A solid incumbent in a slot it no longer leads.

Competitive Positioning 7Differentiation 6Market Timing 6
Pros
  • proven cost-tier slot, EU-language edge
Cons
  • outshone by its own successor
Right for: cost-led European deployments
Avoid if: you want the current flagship narrative
Finance Lead9/10
$0.40 in, $2.00 out — at high volume the monthly bill is a fraction of GPT-5 or Sonnet, and forecasting is boring in the best way.

Pricing is the entire story. $0.40/$2.00 makes Medium 3.1 a standout value among multimodal models with frontier-adjacent quality. For high-volume support tickets, summarisation, classification, and content variants, the monthly bill is a fraction of US flagships. The $0.04 cached-input rate and ~50% batch discount sharpen it further, and predictable throughput makes forecasts simple. There is no self-host lever (closed weights), so this is a pure API-economics play — but at this price, for the right workload, the unit economics are excellent.

Cost Efficiency 9Pricing Transparency 9Value per Dollar 9
Pros
  • very low price, cache/batch discounts, predictable
Cons
  • no self-host capex option
Right for: high-volume API chat
Avoid if: you wanted to amortise GPUs
Domain Practitioner7.5/10
The most boring-in-a-good-way Mistral model I've shipped — clean API, predictable JSON, reliable vision parsing.

After extended production use, Medium 3.1 is the dependable one: clean OpenAI-compatible API, predictable JSON tool use, reliable vision-input document parsing. No reasoning toggle, so for hard problems I route to Magistral or 3.5. Lower output throughput is occasionally annoying for streaming UX but fine for backend pipelines. It is the "good enough at minimum cost" default for builders who don't need agentic firepower. Docs are thinner than Anthropic's but adequate.

API Ergonomics 8Tool/Agent Support 7Reliability 8
Pros
  • stable, predictable, reliable vision
Cons
  • no reasoning toggle
  • slow streaming
Right for: backend chat/extraction pipelines
Avoid if: you need agentic tool depth
Power User7/10
Responsive and helpful for everyday questions; for routine help I can't tell it isn't a flagship — except the stream is a touch slower.

In Le Chat at the Medium tier, Medium 3.1 is responsive and helpful for everyday questions, with excellent European-language quality. It feels slightly less polished than Large 3 or top US models on nuanced creative tasks, but for routine help the difference is hard to notice. Throughput is the main felt downside — replies stream a touch slower than competitors. Refusal rate is moderate and reasonable. A capable, unremarkable daily driver.

Output Quality 7Speed 6.5Everyday Usefulness 7.5
Pros
  • helpful, strong EU languages
Cons
  • slower streaming
  • not the most polished
Right for: routine everyday help
Avoid if: you want flagship polish or speed
Skeptic7/10
Cheap and competent, but it's a closed model with thin published benchmarks and a successor that already beats it — buy it for price, nothing else.

There is little to over-claim here, which makes the skeptic's job easy. Medium 3.1 is honestly positioned as a cost-tier model and delivers on that. The caveats: it is closed-weight, so the "EU sovereignty" halo around Mistral doesn't fully apply (no on-prem); Mistral published almost no numeric benchmarks, so the quality picture leans on third-party aggregation; and its own successor outperforms it on agentic/coding. None of this is deceptive — just a reminder that the value is price-per-token and not capability leadership. Use it where the bill dominates the decision.

Claim Accuracy 8Weakness Severity 6Hype vs Reality 7
Pros
  • honestly cheap, no inflated claims
Cons
  • closed
  • thin benchmarks
  • superseded
Right for: cost-driven workloads
Avoid if: you need sovereignty-by-on-prem or top capability

Strengths

  • Best $/intelligence ratio among Premier-tier multimodal models in its window.
  • Frontier-adjacent non-reasoning quality at a fraction of flagship cost.
  • Strong multilingual European-language performance.
  • Mature, stable API after extended production usage.
  • Native vision at a low price point.

Limitations

  • Closed weights (Premier) — no self-host, unlike Medium 3.5.
  • Output throughput is on the slow side of the price tier.
  • No native extended reasoning — separate model call required.
  • Superseded by Medium 3.5 for most agentic/coding workloads.
  • Sparse public benchmark coverage from Mistral itself.

Best use cases

- High-volume customer support and chat where cost dominates. - Multilingual content workflows in European languages. - Vision-input apps (receipts, charts, screenshots) at a fraction of GPT-class pricing. - Workloads needing fast, cheap, capable chat — not reasoning or agentic coding.

Buyer questions

Can I self-host it?

No — Medium 3.1 is Premier/closed-weight, API-only. For open weights at this tier, use Medium 3.5 (modified-MIT) or step down to Small 4 (Apache 2.0).

Does it reason?

No reasoning toggle. Route hard analytical work to Magistral or Medium 3.5 with high reasoning effort.

Why pick it over Medium 3.5?

Pure cost: $2.00 output vs $7.50. For chat/Q&A at volume that don't need agentic strength, 3.1 is far cheaper.

Is the data EU-resident?

Yes on La Plateforme (EU default), with 30-day abuse retention, no training on inputs unless opt-in, ZDR available — but only as an API, not on-prem.

How fast is it?

~45 tps — adequate for backend, a touch slow for snappy streaming chat.

Which clouds?

Bedrock, Azure AI Foundry, and Vertex AI, plus La Plateforme.

Comparable models

**Mistral Medium 3.5:** Its successor — open weights, reasoning knob, far stronger on agentic coding, but ~3.75x the output price ($7.50 vs $2.00).
**GPT-5 mini:** Similar price tier, weaker EU-language quality, broader ecosystem and tooling.
**Claude Haiku 4.5:** Comparable cost tier; weaker multilingual and a different (Anthropic) safety/ecosystem trade.
**DeepSeek V3.2:** Cheaper still, weaker European-language quality, Chinese-origin residency.

Model specs

Input price
$0.40 / Mtok
Output price
$2 / Mtok
Cached input
$0.04 / Mtok
Batch (in/out)
$0.20 / $1
Context window
131K tokens
Max output
16K tokens
Knowledge cutoff
2025-04
Released
2025-08-12
Modalities
text, image → text
Output speed
~45 tok/s
License
Proprietary
Clouds
Bedrock, Azure AI Foundry, Vertex AI

Does not train on API inputs by default

Other Mistral Medium versions

Last verified 2026-05-27