Ministral 3 8B

GALatest Ministral

by Mistral AI · Ministral 3 family · best for fast multilingual edge model with vision

Edge / On-DeviceOpen-WeightsMultimodalCost-Optimized

7.5

AI Panel Score

Value 9.0/10

Ministral 3 8B (release 25.12, shipped 2 December 2025) is Mistral's mid-edge model: a 9B-total dense transformer (8.4B language model + 0.4B vision encoder) under Apache 2.0, with 256K context and native vision. The reasoning variant posts AIME 2025 78.7%, GPQA Diamond 66.8%, MATH 87.6%, LiveCodeBench 61.6%, and MMLU 76.1% — strong for the tier. Symmetric pricing at $0.15/$0.15. The buyer's sentence: the right default for fast, multilingual, vision-capable edge work on consumer GPUs or laptops, with a clean license.

Compare this model All Ministral 3 versions

What's new

Native vision input at the 8B tier via a 0.4B encoder — the original Ministral 8B v1 was text-only.
Three variants — base, instruct, reasoning — all Apache 2.0.
256K context, up from 32K on v1 (corrects a prior 131K figure — the Mistral 3 family is 256K).
Symmetric pricing $0.15/$0.15 for simple cost modelling.
Repositioned explicitly as Mistral's edge / mobile tier with real reasoning (AIME 78.7%).

Benchmarks

Benchmark	Score	Source
MMLU	76.1%	huggingface.co 2025-12-02T00:00:00.000Z
MATH-500	87.6%	huggingface.co 2025-12-02T00:00:00.000Z
AIME 2025	78.7%	huggingface.co 2025-12-02T00:00:00.000Z
GPQA Diamond	66.8%	huggingface.co 2025-12-02T00:00:00.000Z
LiveCodeBench	61.6%	huggingface.co 2025-12-02T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker7.5/10

“When the constraint is 'must run on a laptop or small server,' this is the default — native vision, real reasoning, and clean Apache 2.0 in one 8B.”

For any feature constrained to a laptop or small server, Ministral 3 8B is now the default. The combination of native vision, real reasoning (AIME 78.7%), EU-language quality, and a clean Apache 2.0 license is unusual at this size. I would route premium queries to Medium 3.5 or Small 4 and let the 8B handle the long tail of simpler tasks and on-device features. For embedded EU deployments where data must stay on the customer's hardware, this is the model — full on-prem control, no license fee, modest hardware.

Strategic Fit 8Vendor Risk 9Roadmap Confidence 7

Pros

laptop-viable, native vision, clean license

Cons

ceiling-limited
soft vision

Right for: on-device/embedded EU features

Avoid if: the workload needs more than an 8B can give

Domain Strategist7.5/10

“Native vision plus real reasoning at 8B, fully open — it strengthens Mistral's edge-tier story against Llama and Qwen on the axes EU buyers care about.”

Strategically the 8B reinforces Mistral's edge-tier position. Native vision and credible reasoning at 8B, under clean Apache 2.0, beat Llama 4 8B and Qwen 3 8B on the EU-relevant axes (multilingual quality, vision, license clarity). It is the bridge between the on-device 3B and the more capable 14B, giving product teams a clean ladder. The differentiation is the bundle at the size, not a single benchmark. Market timing aligns with edge/on-device demand and the EU AI Act compliance tailwind; the open license maximises adoption.

Competitive Positioning 7.5Differentiation 8Market Timing 7.5

Pros

vision+reasoning+open at 8B
clean ladder

Cons

crowded tier
ceiling-limited

Right for: edge-tier product strategies

Avoid if: you need peak capability

Finance Lead9/10

“$0.15/$0.15 makes monthly forecasting boring in the best way, and self-host on a consumer GPU caps cost at infrastructure for very high volume.”

Symmetric $0.15/$0.15 pricing makes monthly forecasting trivial — request count is the only variable. Self-host under clean Apache 2.0 on a consumer GPU caps cost at infrastructure for very high volume, with no license fee (unlike Medium 3.5 or the 125B Devstral). For any cost-sensitive workload that doesn't demand flagship quality, this is the right starting point, and the reasoning variant avoids escalating to a pricier model for many queries. Strong unit economics for the embedded-AI and high-volume-chat use cases.

Cost Efficiency 9Pricing Transparency 9Value per Dollar 9

Pros

symmetric pricing, cheap self-host, no license fee

Cons

none material at this tier

Right for: cost-floor and embedded workloads

Avoid if: you need ceiling capability

Domain Practitioner7.5/10

“Fast and 'fine' for routine tasks; the reasoning variant gives me a careful answer without paying for a bigger model, and it runs locally.”

The instruct variant is fast and reliable for routine tasks — summarisation, extraction, classification, simple agent steps. The reasoning variant is useful when I want a careful answer without a bigger model. Vision is usable for screenshot triage. Same API shape as the rest of Mistral, so swapping is trivial, and clean Apache 2.0 makes fine-tuning straightforward. Runs locally via vLLM/Ollama/llama.cpp/LM Studio at a 16GB footprint. Not a model for hard problems, but a strong default for the easy 80% with a reasoning escape hatch.

API Ergonomics 8Tool/Agent Support 7Reliability 8

Pros

fast, easy fine-tune, local, reasoning option

Cons

ceiling-limited
soft vision

Right for: routine + embedded tasks

Avoid if: you need agentic depth or strong vision

Power User7/10

“Snappy and capable for routine tasks, occasionally rough on nuance — and EU-language quality feels native, not translated.”

Snappy and capable for routine tasks, occasionally rough on nuanced queries. Vision works on simple things. The standout is European-language quality, which feels native rather than translated. Refusal rate is moderate. With reasoning toggled on it takes longer but is noticeably more careful. Conversational warmth is mid — efficient rather than friendly. A solid "free tier" model behind consumer features, especially when on-device delivery gives instant, private responses.

Output Quality 7Speed 7.5Everyday Usefulness 7

Pros

snappy, native EU languages, reasoning option

Cons

rough on nuance
mid warmth

Right for: consumer free-tier/on-device features

Avoid if: you want flagship nuance

Skeptic7.5/10

“Honest published numbers and a clean license — the only thing to flag is that 8B reasoning headlines come from the reasoning variant at full effort.”

Like the 14B, this is an honest small-model launch with real published benchmarks across variants, so there's little to debunk. The fair caveats: the AIME 78.7% headline is the reasoning variant at full effort (latency/token cost), the instruct variant most apps use is less spectacular, and an 8B's ceiling on hard, broad tasks is real — it won't stand in for the 14B or Small 4 where capability matters. Vision is genuinely soft at this size. The honest claim — "fast multilingual edge model with vision and a reasoning option" — holds; just pick the right variant and don't over-extend it.

Claim Accuracy 8Weakness Severity 7Hype vs Reality 8

Pros

real benchmarks
clean license

Cons

reasoning headline is full-effort
8B ceiling
soft vision

Right for: buyers who pick the right variant

Avoid if: you expect 14B-class breadth

Strengths

Native vision at 8B is a structural advantage over many peers in this size class.
Strong reasoning for the tier (AIME 2025 78.7%, MATH 87.6%).
Clean Apache 2.0 enables free commercial fine-tuning.
Excellent EU-language coverage for the size.
Runs on consumer GPUs and beefier laptops (16GB quantised).
256K context is rare at 8B; symmetric pricing simplifies forecasting.

Limitations

Hard tasks reveal the parameter ceiling vs Ministral 14B or Small 4.
Vision quality is "useful," not "polished" — complex documents/charts are tougher.
English conversational warmth trails flagships.
Three-variant SKU choice can confuse newcomers.
Cross-comparable instruct-variant coverage from Mistral is lighter than reasoning.

Best use cases

Mobile or browser-embedded AI features.
Single-GPU on-prem deployments at branch offices or regulated sites.
Domain fine-tuning baseline under Apache 2.0.
Cost-floor chat at very high volume with occasional reasoning.
Embedded vision tasks (OCR-light, screenshot triage).

Deep dive

The full research notes behind this review — verified against primary sources.

Architecture Capabilities Benchmark analysis Speed & latency Pricing analysis Deployment & access Safety & privacy Ecosystem & tooling

Architecture

Ministral 3 8B is a 9B-total dense transformer: an 8.4B language model plus a 0.4B vision encoder (per the HF card). Not a MoE. Context is 256K. Three variants — base, instruct, reasoning — all Apache 2.0. Available in FP8/BF16/GGUF/AWQ; runs on a single 16GB GPU quantised (consumer cards, beefier laptops). Tokenizer is mistral_common. Layer count, attention type, and training scale are undisclosed.

Capabilities

Ministral 3 8B targets edge and resource-constrained deployment: laptops, single-GPU servers, on-prem appliances. The reasoning variant is genuinely strong for the size: AIME 2025 78.7%, MATH 87.6%, GPQA Diamond 66.8% (cap_math 8.0, cap_reasoning 7.5). MMLU 76.1% and Multilingual MMLU 70.6% show solid knowledge and European-language strength (cap_multilingual 7.5). LiveCodeBench 61.6% is decent coding for 8B (cap_coding 6.0). Native vision via the 0.4B encoder handles screenshot, document, and chart understanding on commodity hardware, though quality is "useful" rather than polished at this size (cap_vision 5.5). The 256K context is rare at 8B (cap_long_context 8.0). Apache 2.0 means unfettered commercial fine-tuning. The right model for "good enough" cases where the constraint is latency, hardware, or cost rather than ceiling capability. No native real-time retrieval (cap_realtime_data 0.0).

Benchmark analysis

Benchmark	Score	vs Predecessor	vs Top Competitor	Source
AIME 2025 (reasoning)	78.7%	new	strong for 8B	HF card
AIME 2024 (reasoning)	86.0%	new	strong for size	HF card
GPQA Diamond	66.8%	new	strong for 8B	HF card
MATH (Maj@1)	87.6%	new	top of class	HF card
LiveCodeBench	61.6%	new	decent for 8B	HF card
MMLU (5-shot)	76.1%	up	strong for 8B	HF card

The HF card publishes real reasoning, instruct, and base suites, so coverage is strong for the tier. Reasoning-variant scores are recorded with the variant noted.

Speed & latency

Mistral does not publish official tps/TTFT (null), but a 9B dense model is fast — solidly in the fast latency tier for the instruct variant, and the smallest viable footprint here (16GB quantised) makes it well-suited to interactive on-device use. The reasoning variant trades speed for accuracy. On a consumer GPU or laptop the instruct variant gives near-instant responses, which is much of the point.

Pricing analysis

Surface	Cost	Notes
API input	$0.15 / 1M tok	La Plateforme
API output	$0.15 / 1M tok	symmetric pricing
Cached input	$0.015 / 1M tok	cache read
Batch (in/out)	$0.075 / $0.075	~50% async discount
Self-host	Apache 2.0	weights on Hugging Face, consumer GPU / laptop
Free tier	La Plateforme quota; self-host	no card for self-host
Cloud	Bedrock, Azure AI Foundry	managed

Deployment & access

Apache 2.0 weights on Hugging Face (FP8/BF16/GGUF/AWQ) — clean license, no carve-out. Runs on a single 16GB GPU quantised, including consumer cards and beefier laptops via llama.cpp/Ollama/LM Studio. Three variants (base/instruct/reasoning). Managed on Bedrock and Azure AI Foundry; La Plateforme EU-hosted by default. The right choice for embedded EU deployments where data must stay on the customer's hardware and the device is modest.

Safety & privacy

Standard Mistral posture: GDPR-native, SOC 2 Type II, ISO 27001/27701, EU AI Act aligned, EU residency by default (and full on-prem control via self-host), 30-day abuse retention on API, no training on inputs unless opt-in, ZDR available. No built-in moderation; separate Mistral Moderation API. Moderate refusal calibration.

Ecosystem & tooling

SDKs in Python and TypeScript/JavaScript; runs via vLLM, Ollama, llama.cpp, and LM Studio, with LangChain integration. Available through La Plateforme, Bedrock, and Azure AI Foundry. Apache 2.0 weights drive a growing edge/on-device community across the three variants. Popularity is growing, strongest in mobile/embedded and branch-office EU deployments.

Buyer questions

Is the license clean?

Yes — genuine Apache 2.0, no revenue carve-out. Fine-tune and self-host freely.

What's the context window?

256K (Mistral 3 family standard) — corrects an earlier 131K figure.

Which variant?

Instruct for routine tasks, reasoning for careful multi-step/math answers, base for fine-tuning. Most apps want instruct.

What hardware?

A single 16GB GPU quantised — consumer cards and beefier laptops work.

Does it have vision?

Yes — a 0.4B encoder, useful for screenshot triage and light OCR, not polished for complex documents.

How does it compare to the 14B?

Cheaper and faster, lower reasoning ceiling; step up to the 14B when hard math/reasoning matters.

Where does my data live?

EU by default on La Plateforme, or fully on your device via self-host.

Comparable models

Llama 4 8B:

Comparable size; weaker EU multilingual and weaker/absent native vision.

Qwen 3 8B:

Comparable size; stronger Chinese, weaker EU-language quality.

Ministral 3 14B (Mistral):

Bigger sibling — meaningfully stronger reasoning, same family API, ~1.3x the price.

Phi-4 mini:

Comparable tier; no native vision, weaker multilingual.

Sources

Primary references used to verify this review.

Model specs

Input price: $0.15 / Mtok
Output price: $0.15 / Mtok
Cached input: $0.01 / Mtok
Batch (in/out): $0.07 / $0.07
Context window: 256K tokens
Max output: 16K tokens
Knowledge cutoff: 2025-09
Released: 2025-12-01
Modalities: text, image → text
Output speed: Not profiled
License: Open weights (Apache-2.0)
Clouds: Bedrock, Azure AI Foundry

Does not train on API inputs by default

Other Ministral 3 versions

Last verified 2026-05-27