Is it really open weights?

Yes — genuine Apache 2.0 (verified on the Hugging Face card), permitting commercial use, modification, and redistribution with no revenue threshold. This is unlike Medium 3.5 and Devstral 2, which carry a modified-MIT revenue restriction.

What does it cost to self-host?

FP8 needs a single high-memory node (~8x H100/H200, ~320GB+ VRAM); NVFP4 lowers the bar to H100/A100 nodes. Break-even versus the API arrives only at sustained high volume.

La Plateforme is EU-hosted by default with US options; 30-day abuse-monitoring retention, no training on inputs unless you opt in, and Zero Data Retention available.

How does it compare to US frontier models?

On generalist tasks it is competitive; on hard reasoning (GPQA, AIME) and English polish it trails GPT-5.x and Claude. AA Index 23 vs ~36 frontier median.

Not by default — there is no chain-of-thought toggle on Large 3. A reasoning variant was announced as coming; for extended thinking today, route to Magistral or Medium 3.5.

Which clouds is it on?

Amazon Bedrock, Azure AI Foundry, and IBM watsonx, plus self-host and La Plateforme.

Is it compliant for EU enterprise?

SOC 2 Type II, ISO 27001/27701, GDPR-native, EU AI Act aligned (non-high-risk obligations land August 2026).

Mistral Large 3 Review — Benchmarks, Pricing & AI Panel Verdict

Benchmark	Score	Source
MMLU	85.5%	intuitionlabs.ai 2025-12-02T00:00:00.000Z
Artificial Analysis Index	23	artificialanalysis.ai 2026-05-28T00:00:00.000Z

Architecture

Large 3 is a granular sparse Mixture-of-Experts. Mistral's own card states 675B total parameters split as a 673B MoE language model with ~39B active plus a 2.5B vision encoder, for ~41B active per forward pass. The number of experts is not disclosed. It was trained from scratch on 3,000 H200 GPUs. The model ships natively in FP8, with NVFP4 and BF16 variants published; FP8 fits a single B200/H200 node while NVFP4 runs on H100/A100 nodes. Tokenization uses mistral_common (>= 1.8.6). Attention mechanism, layer count, training-token count, and vocab size are undisclosed — typical for Mistral, which discloses parameter counts for open-weight models but withholds deeper internals.

Capabilities

Large 3 is a generalist multimodal MoE tuned for instruction following, code generation, multilingual dialogue, document understanding, and tool use. The granular routing activates ~6% of parameters per token, so it draws on the breadth of a 675B pool at the cost of a mid-size dense model (cap_coding 7.5, cap_agentic 7.5). Its standout axis is multilingual quality (cap_multilingual 9.0): French, German, Spanish, Italian, Portuguese, and Dutch output reads natively, with competent Chinese, Japanese, Korean, and Arabic across 40+ languages. The 256K context is real and usable (cap_long_context 8.0). Native vision (cap_vision 7.0) handles charts, screenshots, and documents but trails Gemini and GPT-class vision on hard document benchmarks. It has no default chain-of-thought, so it trails dedicated reasoning models on multi-step math and PhD-science (cap_reasoning 7.0, cap_math 6.5). Tool use, JSON-mode structured output, and function calling are reliable (cap_function_calling 8.0). It has no native real-time retrieval (cap_realtime_data 0.0).

Benchmark analysis

Benchmark	Score	vs Predecessor	vs Top Competitor	Source
MMLU	85.5%	+~1.5pp vs Large 2.1	trails GPT-5.x (~89%)	IntuitionLabs
Artificial Analysis Index	23	new methodology	below frontier median (~36)	Artificial Analysis
LMArena rank	#2 OSS non-reasoning	up	behind larger OSS models on reasoning	Mistral

Mistral published comparison charts at launch but withheld most numeric standard-benchmark scores (MMLU-Pro, GPQA Diamond, AIME, LiveCodeBench, SWE-bench). Benchmark coverage is partial — only the values with a verifiable source are recorded; the rest are null.

Speed & latency

Artificial Analysis measures Large 3 at 50.7 output tokens/sec with a 1.09s time-to-first-token — squarely in the medium latency tier and notably slower than the throughput-optimized smaller Mistrals. For a 675B MoE this is expected; interactive chat feels responsive but not snappy, and the model is best deployed for backend pipelines, batch document processing, and agent loops rather than sub-second UX.

Pricing analysis

Surface	Cost	Notes
API input	$0.50 / 1M tok	La Plateforme
API output	$1.50 / 1M tok	La Plateforme
Cached input	$0.05 / 1M tok	cache read
Batch (in/out)	$0.25 / $0.75	~50% async discount
Direct UI	EUR 14.99/mo (~$15)	Le Chat Pro
Free tier	~25 msg/day (Le Chat); daily token quota (La Plateforme)	no card required
Self-host	Apache 2.0	weights on Hugging Face
Cloud	Bedrock, Azure AI Foundry, IBM watsonx	managed

Deployment & access

This is the strongest deployment story in the Mistral lineup. Apache 2.0 weights are published on Hugging Face (FP8/NVFP4/BF16); self-hosting needs a single high-memory node (~320GB+ VRAM, i.e. 8x H100/H200 or B200, less at NVFP4). Managed availability spans Amazon Bedrock, Azure AI Foundry, and IBM watsonx. La Plateforme is EU-hosted by default — the data-residency wedge — with US options. For a regulated French, German, or Dutch enterprise, Large 3 is the canonical choice: frontier-adjacent capability that can run on-prem under a permissive license, satisfying data-residency rules and avoiding US Cloud Act exposure.

Safety & privacy

Mistral is a French company natively subject to GDPR, holds SOC 2 Type II and ISO 27001/27701, and positions itself as aligned with the EU AI Act. API inputs are retained 30 rolling days for abuse monitoring and are not used for training unless the customer opts in; Zero Data Retention is available and removes the 30-day window. EU data residency by default is the central enterprise selling point. Content moderation is not built into the model but is offered as a separate Mistral Moderation API (mistral-moderation-2603, 9 categories plus jailbreak detection). Refusal calibration is moderate and consistent — less aggressively safety-tuned than US frontier labs.

Ecosystem & tooling

SDKs in Python and TypeScript/JavaScript; integrations with LangChain, LlamaIndex, Vercel AI SDK, and Haystack. Powers Le Chat and Mistral AI Studio. Open weights drive a growing self-host and fine-tune community on Hugging Face (FP8/NVFP4/BF16/GGUF derivatives). Popularity is growing rather than dominant — strong in Europe, narrower elsewhere.

Mistral Large 3

What's new

Benchmarks

AI Panel Review

Strengths

Limitations

Best use cases

Deep dive

Architecture

Capabilities

Benchmark analysis

Speed & latency

Pricing analysis

Deployment & access

Safety & privacy

Ecosystem & tooling

Buyer questions

Comparable models

Sources

Model specs