by Mistral AI · Mistral Medium family · best for cheap multilingual multimodal chat at volume
Mistral Medium 3.1 (release 25.08, shipped 13 August 2025) is the cost-optimised workhorse of Mistral's proprietary "Premier" tier: a closed-weight, multimodal, 128K-context model priced at $0.40/$2.00 per 1M tokens — among the cheapest frontier-adjacent multimodal models on the market. It is an incremental refinement of Medium 3 (May 2025) with better instruction following, sharper tool use, and improved vision. The buyer's sentence: when budget is the dominant constraint and the workload is chat/Q&A rather than agentic coding, this is still the value pick — though Medium 3.5 now supersedes it for coding and agentic work. - Provider: Mistral AI (Paris, France) - Release: 2025-08-13, status GA - Context: 131,072 tokens; max output 16,384 - Modalities: text + image in, text out (native multimodal) - Knowledge cutoff: ~April 2025 - Headline price: $0.40 input / $2.00 output per 1M tokens - Architecture: dense (parameter count undisclosed — Premier/closed)
| Benchmark | Score | Source |
|---|---|---|
| HumanEval | 92% | artificialanalysis.ai 2026-05-28T00:00:00.000Z |
| GPQA Diamond | 57% | artificialanalysis.ai 2026-05-28T00:00:00.000Z |
| Artificial Analysis Index | 21 | artificialanalysis.ai 2026-05-28T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“The model I've actually been deploying since late 2025 — cheap, multilingual, multimodal, stable — even if Medium 3.5 now outclasses it for agents.”
Medium 3.1 has been the dependable production workhorse: inexpensive, multilingual, multimodal, and operationally reliable on La Plateforme. The strategic weakness is closed weights — no on-prem, so the sovereignty story is limited to EU-hosted API. With Medium 3.5 now offering open weights and a reasoning knob, I would only choose 3.1 when budget is the binding constraint and the workload is chat/Q&A rather than agentic coding. For mid-sized European SaaS where EU residency matters and budget is real, it remains a sound default for the chat surface.
“It owns the 'cheap capable multimodal' slot in Europe, but its successor is already eating its strategic mindshare.”
Medium 3.1's positioning was sharp at launch: frontier-adjacent multimodal capability at a price that unlocked high-volume European-language use cases US flagships couldn't touch economically. That slot is still valuable, but the strategic narrative has moved to Medium 3.5 (open weights, merged architecture). 3.1 now reads as the mature, boring, cost-tier option rather than the headline. Against GPT-5 mini and Claude Haiku it competes on EU-language quality and price; against its own successor it competes only on cost. A solid incumbent in a slot it no longer leads.
“$0.40 in, $2.00 out — at high volume the monthly bill is a fraction of GPT-5 or Sonnet, and forecasting is boring in the best way.”
Pricing is the entire story. $0.40/$2.00 makes Medium 3.1 a standout value among multimodal models with frontier-adjacent quality. For high-volume support tickets, summarisation, classification, and content variants, the monthly bill is a fraction of US flagships. The $0.04 cached-input rate and ~50% batch discount sharpen it further, and predictable throughput makes forecasts simple. There is no self-host lever (closed weights), so this is a pure API-economics play — but at this price, for the right workload, the unit economics are excellent.
“The most boring-in-a-good-way Mistral model I've shipped — clean API, predictable JSON, reliable vision parsing.”
After extended production use, Medium 3.1 is the dependable one: clean OpenAI-compatible API, predictable JSON tool use, reliable vision-input document parsing. No reasoning toggle, so for hard problems I route to Magistral or 3.5. Lower output throughput is occasionally annoying for streaming UX but fine for backend pipelines. It is the "good enough at minimum cost" default for builders who don't need agentic firepower. Docs are thinner than Anthropic's but adequate.
“Responsive and helpful for everyday questions; for routine help I can't tell it isn't a flagship — except the stream is a touch slower.”
In Le Chat at the Medium tier, Medium 3.1 is responsive and helpful for everyday questions, with excellent European-language quality. It feels slightly less polished than Large 3 or top US models on nuanced creative tasks, but for routine help the difference is hard to notice. Throughput is the main felt downside — replies stream a touch slower than competitors. Refusal rate is moderate and reasonable. A capable, unremarkable daily driver.
“Cheap and competent, but it's a closed model with thin published benchmarks and a successor that already beats it — buy it for price, nothing else.”
There is little to over-claim here, which makes the skeptic's job easy. Medium 3.1 is honestly positioned as a cost-tier model and delivers on that. The caveats: it is closed-weight, so the "EU sovereignty" halo around Mistral doesn't fully apply (no on-prem); Mistral published almost no numeric benchmarks, so the quality picture leans on third-party aggregation; and its own successor outperforms it on agentic/coding. None of this is deceptive — just a reminder that the value is price-per-token and not capability leadership. Use it where the bill dominates the decision.
- High-volume customer support and chat where cost dominates. - Multilingual content workflows in European languages. - Vision-input apps (receipts, charts, screenshots) at a fraction of GPT-class pricing. - Workloads needing fast, cheap, capable chat — not reasoning or agentic coding.
No — Medium 3.1 is Premier/closed-weight, API-only. For open weights at this tier, use Medium 3.5 (modified-MIT) or step down to Small 4 (Apache 2.0).
No reasoning toggle. Route hard analytical work to Magistral or Medium 3.5 with high reasoning effort.
Pure cost: $2.00 output vs $7.50. For chat/Q&A at volume that don't need agentic strength, 3.1 is far cheaper.
Yes on La Plateforme (EU default), with 30-day abuse retention, no training on inputs unless opt-in, ZDR available — but only as an API, not on-prem.
~45 tps — adequate for backend, a touch slow for snappy streaming chat.
Bedrock, Azure AI Foundry, and Vertex AI, plus La Plateforme.
Does not train on API inputs by default
Last verified 2026-05-27