by Mistral AI · Mistral Large family · best for EU-sovereign open-weight frontier generalist
Mistral Large 3 (release 25.12, shipped 2 December 2025) is the French lab's open-weight frontier flagship and the centerpiece of its EU-sovereignty pitch. It is a 675B-parameter granular Mixture-of-Experts with only 41B active per token (39B language model plus a 2.5B vision encoder), trained from scratch on 3,000 NVIDIA H200s, released under a genuine Apache 2.0 license. The single sentence a buyer needs: it is the strongest permissively-licensed, self-hostable generalist model a regulated European enterprise can run inside its own VPC, at roughly a quarter of a US flagship's API price. - Provider: Mistral AI (Paris, France) - Release: 2025-12-02, status GA - Context: 256,000 tokens; max output 32,768 - Modalities: text + image in, text out (native multimodal) - Knowledge cutoff: ~October 2025 - Headline price: $0.50 input / $1.50 output per 1M tokens - Architecture: granular MoE, 675B total / 41B active
| Benchmark | Score | Source |
|---|---|---|
| MMLU | 85.5% | intuitionlabs.ai 2025-12-02T00:00:00.000Z |
| Artificial Analysis Index | 23 | artificialanalysis.ai 2026-05-28T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“The only frontier-adjacent model I can put on-prem under Apache 2.0 and tell my DPO it never leaves the EU — sovereignty as a product feature.”
For a one-to-three-year platform bet in Europe, Large 3 is the strategic default. Apache 2.0 weights plus EU-default residency plus SOC 2/ISO 27001 remove the two biggest enterprise blockers — vendor lock-in and Cloud Act exposure — in one model. Multi-cloud managed availability (Bedrock, Azure, watsonx) makes failover real. The trade-off is capability: this is a strong generalist, not a US-frontier reasoner, so analytical and hard-coding workloads still benchmark better elsewhere. For any architecture decision where EU deployment is on the table, it is the obvious open option; for pure capability with no residency constraint, it is not.
“Mistral owns the 'European AI champion' narrative, and Large 3 is the flagship that makes the sovereignty positioning credible rather than rhetorical.”
Strategically, Large 3 is positioned precisely where US and Chinese labs are weak: a Western, GDPR-native, Apache-licensed frontier model. That is a defensible moat in EU public sector, defence, and regulated finance, segments where neither OpenAI nor DeepSeek can compete on residency. Against Llama 4 it wins decisively on European-language quality and license clarity; against US closed models it wins on sovereignty and price. The risk is that "sovereignty" is a moat in Europe and a niche elsewhere — in US-only procurement the value proposition mostly evaporates. Market timing is excellent: EU AI Act obligations land August 2026, making the compliance story sharper.
“$0.50 in, $1.50 out for an open-weight frontier model — and if volume justifies it, I amortise GPUs instead of renting margin from a US vendor forever.”
The unit economics are excellent. At $0.50/$1.50 Large 3 is 3-5x cheaper than US flagships at the generalist tier, with a $0.05 cached-input rate and ~50% batch discount layered on. The deeper financial argument is self-host: Apache 2.0 converts variable per-token spend into fixed GPU capex, which makes annual budgeting predictable for sustained high-volume workloads. The break-even is real only at scale — a single 8x-H100 node is a meaningful capital outlay — so for low volume the API is cheaper. Cache and batch discipline matter given the heavy-node footprint. Strong value, best realised at scale.
“OpenAI-compatible JSON, predictable tool calls, and an actual self-host escape hatch — I can prototype on La Plateforme and ship on vLLM without a rewrite.”
The API is clean and OpenAI-compatible; JSON mode and function calls behave predictably, and SDK quality has caught up to the leaders. The killer feature for a builder is portability: prototype against the hosted endpoint, then self-host the identical Apache 2.0 weights on vLLM or TGI with no code rewrite. Multilingual code-comment and identifier handling beat Llama 4 for European-language repos, and 256K context is genuine. Downsides: La Plateforme's docs are thinner than Anthropic's, function-calling edge cases occasionally surprise, and observability tooling lags OpenAI's. A strong default for any EU-deployed agent project.
“In French and German it's the best open model I've used; in English it's a competent half-step behind Claude and GPT.”
In Le Chat the model is fast enough, accurate in European languages, and more willing to engage nuanced topics than the most heavily safety-tuned US models — refusals are moderate and consistent. For a French or German user this is the best conversation partner among open models. English conversational quality sits a tier below Claude or GPT-5: a touch more literal, less warm. Vision answers work but feel less polished than the leaders. Latency at ~51 tps is fine for chat but not snappy. Overall a strong daily driver for European-language users, merely good for English-first ones.
“Mistral shipped comparison charts and an Apache license, then quietly withheld the standard benchmarks — AA Index 23 against a 36 median tells the part the launch deck didn't.”
The sovereignty and license story is genuine and the price is real. The capability story is where I push back. Mistral published MMLU and LMArena bragging rights but withheld GPQA Diamond, MMLU-Pro, AIME, LiveCodeBench, and SWE-bench at launch — the benchmarks where it would be measured against US frontier models. Independent aggregation fills the gap: Artificial Analysis puts the Intelligence Index at 23 versus a ~36 frontier median, i.e. clearly below GPT-5.x and Claude/Gemini frontier. The "frontier" label is marketing; "best open generalist with EU residency" is the honest claim. Buy it for sovereignty and price, not because it out-thinks the US leaders.
- EU-headquartered enterprises needing a non-US, non-Chinese foundation model with full data sovereignty and a permissive license. - Multilingual customer-facing applications across European markets. - On-prem and air-gapped deployments in regulated industries (defence, public sector, finance) where Apache 2.0 matters. - Long-context document analysis where 256K is genuinely needed. - Multi-step agent workflows with tool use and structured JSON output.
Yes — genuine Apache 2.0 (verified on the Hugging Face card), permitting commercial use, modification, and redistribution with no revenue threshold. This is unlike Medium 3.5 and Devstral 2, which carry a modified-MIT revenue restriction.
FP8 needs a single high-memory node (~8x H100/H200, ~320GB+ VRAM); NVFP4 lowers the bar to H100/A100 nodes. Break-even versus the API arrives only at sustained high volume.
La Plateforme is EU-hosted by default with US options; 30-day abuse-monitoring retention, no training on inputs unless you opt in, and Zero Data Retention available.
On generalist tasks it is competitive; on hard reasoning (GPQA, AIME) and English polish it trails GPT-5.x and Claude. AA Index 23 vs ~36 frontier median.
Not by default — there is no chain-of-thought toggle on Large 3. A reasoning variant was announced as coming; for extended thinking today, route to Magistral or Medium 3.5.
Amazon Bedrock, Azure AI Foundry, and IBM watsonx, plus self-host and La Plateforme.
SOC 2 Type II, ISO 27001/27701, GDPR-native, EU AI Act aligned (non-high-risk obligations land August 2026).
Does not train on API inputs by default
Last verified 2026-05-27