Mistral Large 3

GALatest Large

by Mistral AI · Mistral Large family · best for EU-sovereign open-weight frontier generalist

FrontierOpen-WeightsmultilingualLong-ContextMultimodal
8.0
AI Panel Score
Value 9.0/10

Mistral Large 3 (release 25.12, shipped 2 December 2025) is the French lab's open-weight frontier flagship and the centerpiece of its EU-sovereignty pitch. It is a 675B-parameter granular Mixture-of-Experts with only 41B active per token (39B language model plus a 2.5B vision encoder), trained from scratch on 3,000 NVIDIA H200s, released under a genuine Apache 2.0 license. The single sentence a buyer needs: it is the strongest permissively-licensed, self-hostable generalist model a regulated European enterprise can run inside its own VPC, at roughly a quarter of a US flagship's API price. - Provider: Mistral AI (Paris, France) - Release: 2025-12-02, status GA - Context: 256,000 tokens; max output 32,768 - Modalities: text + image in, text out (native multimodal) - Knowledge cutoff: ~October 2025 - Headline price: $0.50 input / $1.50 output per 1M tokens - Architecture: granular MoE, 675B total / 41B active

What's new

  • First Mistral Large shipped as full open weights under Apache 2.0; Large 2 was Mistral Research License only. This is the structural change.
  • Granular MoE replaces the dense Large 2 architecture: 675B total, ~41B active (~6% activation), keeping inference cost near a 40-50B dense model.
  • Context expanded from 128K (Large 2) to 256K.
  • Native image understanding added via a 2.5B vision encoder; Large 2 was text-only.
  • Pricing cut dramatically: $0.50/$1.50 versus the $2/$6 region of earlier Large tiers. This is a deliberate land-grab on the open-weight frontier.
  • Base and instruct checkpoints shipped at launch; a dedicated reasoning variant was announced as "coming soon."

Benchmarks

BenchmarkScoreSource
MMLU85.5%intuitionlabs.ai 2025-12-02T00:00:00.000Z
Artificial Analysis Index23artificialanalysis.ai 2026-05-28T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker8.5/10
The only frontier-adjacent model I can put on-prem under Apache 2.0 and tell my DPO it never leaves the EU — sovereignty as a product feature.

For a one-to-three-year platform bet in Europe, Large 3 is the strategic default. Apache 2.0 weights plus EU-default residency plus SOC 2/ISO 27001 remove the two biggest enterprise blockers — vendor lock-in and Cloud Act exposure — in one model. Multi-cloud managed availability (Bedrock, Azure, watsonx) makes failover real. The trade-off is capability: this is a strong generalist, not a US-frontier reasoner, so analytical and hard-coding workloads still benchmark better elsewhere. For any architecture decision where EU deployment is on the table, it is the obvious open option; for pure capability with no residency constraint, it is not.

Strategic Fit 9Vendor Risk 9Roadmap Confidence 8
Pros
  • Apache 2.0, EU residency, multi-cloud, aggressive price
Cons
  • capability gap vs US frontier
  • reasoning variant not yet shipped
Right for: regulated EU enterprises
Avoid if: you need top-tier reasoning and have no sovereignty constraint
Domain Strategist8.5/10
Mistral owns the 'European AI champion' narrative, and Large 3 is the flagship that makes the sovereignty positioning credible rather than rhetorical.

Strategically, Large 3 is positioned precisely where US and Chinese labs are weak: a Western, GDPR-native, Apache-licensed frontier model. That is a defensible moat in EU public sector, defence, and regulated finance, segments where neither OpenAI nor DeepSeek can compete on residency. Against Llama 4 it wins decisively on European-language quality and license clarity; against US closed models it wins on sovereignty and price. The risk is that "sovereignty" is a moat in Europe and a niche elsewhere — in US-only procurement the value proposition mostly evaporates. Market timing is excellent: EU AI Act obligations land August 2026, making the compliance story sharper.

Competitive Positioning 9Differentiation 9Market Timing 8
Pros
  • unique sovereign-frontier slot
  • EU AI Act tailwind
Cons
  • moat is geographic
Right for: EU-facing products and procurement
Avoid if: your market doesn't price sovereignty
Finance Lead8.5/10
$0.50 in, $1.50 out for an open-weight frontier model — and if volume justifies it, I amortise GPUs instead of renting margin from a US vendor forever.

The unit economics are excellent. At $0.50/$1.50 Large 3 is 3-5x cheaper than US flagships at the generalist tier, with a $0.05 cached-input rate and ~50% batch discount layered on. The deeper financial argument is self-host: Apache 2.0 converts variable per-token spend into fixed GPU capex, which makes annual budgeting predictable for sustained high-volume workloads. The break-even is real only at scale — a single 8x-H100 node is a meaningful capital outlay — so for low volume the API is cheaper. Cache and batch discipline matter given the heavy-node footprint. Strong value, best realised at scale.

Cost Efficiency 9Pricing Transparency 8Value per Dollar 9
Pros
  • lowest frontier-tier API price
  • self-host capex option
Cons
  • self-host break-even needs scale
Right for: high-volume EU workloads
Avoid if: low volume with no residency need (smaller Mistrals are cheaper still)
Domain Practitioner8/10
OpenAI-compatible JSON, predictable tool calls, and an actual self-host escape hatch — I can prototype on La Plateforme and ship on vLLM without a rewrite.

The API is clean and OpenAI-compatible; JSON mode and function calls behave predictably, and SDK quality has caught up to the leaders. The killer feature for a builder is portability: prototype against the hosted endpoint, then self-host the identical Apache 2.0 weights on vLLM or TGI with no code rewrite. Multilingual code-comment and identifier handling beat Llama 4 for European-language repos, and 256K context is genuine. Downsides: La Plateforme's docs are thinner than Anthropic's, function-calling edge cases occasionally surprise, and observability tooling lags OpenAI's. A strong default for any EU-deployed agent project.

API Ergonomics 8Tool/Agent Support 8Reliability 8
Pros
  • portable weights, clean API, good multilingual code
Cons
  • thinner docs, occasional FC drift
Right for: EU agent builders
Avoid if: you need the deepest tooling ecosystem
Power User7.5/10
In French and German it's the best open model I've used; in English it's a competent half-step behind Claude and GPT.

In Le Chat the model is fast enough, accurate in European languages, and more willing to engage nuanced topics than the most heavily safety-tuned US models — refusals are moderate and consistent. For a French or German user this is the best conversation partner among open models. English conversational quality sits a tier below Claude or GPT-5: a touch more literal, less warm. Vision answers work but feel less polished than the leaders. Latency at ~51 tps is fine for chat but not snappy. Overall a strong daily driver for European-language users, merely good for English-first ones.

Output Quality 7.5Speed 7Everyday Usefulness 8
Pros
  • native-quality EU languages, low refusal
Cons
  • English polish trails frontier
  • modest speed
Right for: European-language daily use
Avoid if: you want the warmest English chat
Skeptic6.5/10
Mistral shipped comparison charts and an Apache license, then quietly withheld the standard benchmarks — AA Index 23 against a 36 median tells the part the launch deck didn't.

The sovereignty and license story is genuine and the price is real. The capability story is where I push back. Mistral published MMLU and LMArena bragging rights but withheld GPQA Diamond, MMLU-Pro, AIME, LiveCodeBench, and SWE-bench at launch — the benchmarks where it would be measured against US frontier models. Independent aggregation fills the gap: Artificial Analysis puts the Intelligence Index at 23 versus a ~36 frontier median, i.e. clearly below GPT-5.x and Claude/Gemini frontier. The "frontier" label is marketing; "best open generalist with EU residency" is the honest claim. Buy it for sovereignty and price, not because it out-thinks the US leaders.

Claim Accuracy 6Weakness Severity 6Hype vs Reality 7
Pros
  • real license, real price, real residency
Cons
  • selective benchmark disclosure
  • sub-frontier reasoning
Right for: buyers who value sovereignty over peak capability
Avoid if: you take "frontier" literally

Strengths

  • Best permissively-licensed (Apache 2.0) open-weight frontier MoE from a major lab.
  • 256K context handles enterprise documents and multi-file repos.
  • Best-in-class European-language quality among open models.
  • Aggressive pricing: $0.50/$1.50 undercuts US flagships by 3-5x.
  • Full EU data-sovereignty and self-host story; multi-cloud managed availability.
  • Solid tool use and JSON output for agent loops.

Limitations

  • Loses clearly to dedicated reasoning models (Magistral, DeepSeek R1, GPT-5/Claude thinking) on GPQA Diamond and AIME — no default chain-of-thought.
  • Mistral withheld most standard numeric benchmarks at launch, making rigorous head-to-head hard.
  • AA Intelligence Index of 23 sits below the frontier median; this is a strong generalist, not a US-frontier-class reasoner.
  • 41B active is heavier to self-host than a dense 70B at similar generalist capability.
  • Vision is competent but behind Gemini/GPT on chart and document benchmarks.
  • Throughput (50.7 tps) is modest for interactive UX.

Best use cases

- EU-headquartered enterprises needing a non-US, non-Chinese foundation model with full data sovereignty and a permissive license. - Multilingual customer-facing applications across European markets. - On-prem and air-gapped deployments in regulated industries (defence, public sector, finance) where Apache 2.0 matters. - Long-context document analysis where 256K is genuinely needed. - Multi-step agent workflows with tool use and structured JSON output.

Buyer questions

Is it really open weights?

Yes — genuine Apache 2.0 (verified on the Hugging Face card), permitting commercial use, modification, and redistribution with no revenue threshold. This is unlike Medium 3.5 and Devstral 2, which carry a modified-MIT revenue restriction.

What does it cost to self-host?

FP8 needs a single high-memory node (~8x H100/H200, ~320GB+ VRAM); NVFP4 lowers the bar to H100/A100 nodes. Break-even versus the API arrives only at sustained high volume.

Where is my data?

La Plateforme is EU-hosted by default with US options; 30-day abuse-monitoring retention, no training on inputs unless you opt in, and Zero Data Retention available.

How does it compare to US frontier models?

On generalist tasks it is competitive; on hard reasoning (GPQA, AIME) and English polish it trails GPT-5.x and Claude. AA Index 23 vs ~36 frontier median.

Does it reason?

Not by default — there is no chain-of-thought toggle on Large 3. A reasoning variant was announced as coming; for extended thinking today, route to Magistral or Medium 3.5.

Which clouds is it on?

Amazon Bedrock, Azure AI Foundry, and IBM watsonx, plus self-host and La Plateforme.

Is it compliant for EU enterprise?

SOC 2 Type II, ISO 27001/27701, GDPR-native, EU AI Act aligned (non-high-risk obligations land August 2026).

Comparable models

**Llama 4 Maverick:** Comparable open-weight MoE tier; weaker on European languages and license clarity, stronger US ecosystem and tooling support.
**DeepSeek V3.2:** Cheaper and comparable on English coding; weaker EU-multilingual quality and a less clean permissive-license story; Chinese-origin residency is a non-starter for EU-sovereignty buyers.
**GPT-5.x / Claude Opus 4.7:** Closed weights, 2-5x the price, clearly stronger on reasoning and English polish, but no EU data-sovereignty or self-host story — the inverse trade-off to Large 3.

Model specs

Input price
$0.50 / Mtok
Output price
$1.50 / Mtok
Cached input
$0.05 / Mtok
Batch (in/out)
$0.25 / $0.75
Context window
256K tokens
Max output
33K tokens
Knowledge cutoff
2025-10
Released
2025-12-01
Modalities
text, image → text
Output speed
~50.7 tok/s
License
Open weights (Apache-2.0)
Clouds
Bedrock, Azure AI Foundry, IBM watsonx

Does not train on API inputs by default

Last verified 2026-05-27