Ministral 3 3B

GALatest Ministral

by Mistral AI · Ministral 3 family · best for on-device AI with native vision

Edge / On-DeviceOpen-WeightsMultimodalCost-Optimized

7.1

AI Panel Score

Value 9.0/10

Ministral 3 3B (release 25.12, shipped 2 December 2025) is the smallest member of the Mistral 3 family, built for on-device and severely resource-constrained deployment: a 4B-total dense transformer (3.4B language model + 0.4B vision encoder) under Apache 2.0, with 256K context and — unusually for the size — native vision. The reasoning variant posts AIME 2025 72.1%, MATH 83.0%, GPQA Diamond 53.4%, and MMLU 70.7%. Symmetric pricing at $0.10/$0.10, the cost floor of the lineup. The buyer's sentence: the right tool for embedding AI in phones, browsers, and IoT, with native vision and a clean license — and the wrong tool for anything that needs flagship nuance.

Compare this model All Ministral 3 versions

What's new

Smallest Mistral 3 family member, explicitly targeting on-device and severely resource-constrained deployment.
Native vision input via a 0.4B encoder — rare at 3-4B scale.
Three variants — base, instruct, reasoning — all Apache 2.0.
Symmetric pricing $0.10/$0.10 — the cost floor of the lineup.
256K context retained from the larger Ministral 3 siblings (corrects a prior 131K figure — the Mistral 3 family is 256K).

Benchmarks

Benchmark	Score	Source
MMLU	70.7%	huggingface.co 2025-12-02T00:00:00.000Z
MATH-500	83%	huggingface.co 2025-12-02T00:00:00.000Z
AIME 2025	72.1%	huggingface.co 2025-12-02T00:00:00.000Z
GPQA Diamond	53.4%	huggingface.co 2025-12-02T00:00:00.000Z
LiveCodeBench	54.8%	huggingface.co 2025-12-02T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker7/10

“It lets me ship AI features that must run on the customer's device — native vision plus Apache 2.0 means a multimodal feature with no per-request cost or residency headache.”

Ministral 3 3B is the model that lets me ship AI features that have to run on the customer's device. For privacy-sensitive features (mobile keyboards, on-device assistants, EU-residency-strict products), it is the right choice: native vision plus clean Apache 2.0 means I can embed a multimodal feature with no per-request API cost and no data-residency exposure. The ceiling is real, so I would not put it on anything customer-visible that needs reliable nuance — it works best paired with bigger Mistral models for escalation. For embedded, offline, and privacy-first scenarios, strategically valuable.

Strategic Fit 7Vendor Risk 9Roadmap Confidence 7

Pros

on-device, native vision, clean license, no per-request cost

Cons

hard ceiling
soft vision

Right for: embedded/offline/privacy-first features

Avoid if: the feature needs reliable nuance

Domain Strategist7/10

“Native vision at 3B, fully open, is a genuine differentiator for the on-device tier — it expands where 'AI inside the app' is feasible.”

Strategically, the 3B extends Mistral's reach to the on-device tier where most competitors either lack vision or carry restrictive licenses. Native multimodality at 3-4B under clean Apache 2.0 is a real differentiator that expands the feasible surface for "AI inside the app" — mobile, browser, IoT — particularly for EU products that need data to stay on-device. It completes Mistral's ladder (3B on-device, 8B edge, 14B edge-reasoning) into a coherent story. The differentiation is the on-device multimodal + open combination; the limitation is that, as the smallest model, its strategic value is bounded to narrow embedded tasks.

Competitive Positioning 7Differentiation 8Market Timing 7

Pros

on-device multimodal + open
completes the ladder

Cons

bounded to narrow tasks

Right for: on-device product strategies

Avoid if: you need capability beyond embedded tasks

Finance Lead8.5/10

“$0.10/$0.10 is the floor, but the real win is on-device self-host at zero marginal cost for products shipping to millions.”

$0.10/$0.10 is the cost floor of the lineup, and the real win is on-device self-host at effectively zero marginal cost. For any product shipping AI to millions of users where per-call API spend would dominate, this is the right tier, and clean Apache 2.0 means I amortise integration cost with no licensing surprises (unlike Medium 3.5 or the 125B Devstral). Annual forecasting is trivial. Strong unit economics specifically in the embedded-AI use case; for anything needing more capability, the saving evaporates against the ceiling.

Cost Efficiency 9Pricing Transparency 9Value per Dollar 9

Pros

cost floor, zero-marginal on-device, clean license

Cons

value bounded by capability ceiling

Right for: mass-scale embedded AI

Avoid if: capability per call matters more than cost

Domain Practitioner7/10

“Straightforward to embed — same API, well-quantised weights, runs on phones with llama.cpp/MLX — and the instruct variant is reliable for narrow tasks.”

For embedded use the developer experience is straightforward: same API shape as the rest of Mistral, well-quantised weights, runs on phones and Apple silicon via llama.cpp/Ollama/LM Studio/MLX. The reasoning variant occasionally surprises with a careful answer to a hard question, but I would not bet a product on it. Vision is basic. The instruct variant for narrow tasks (classification, extraction, slot filling) is reliable enough. Pleasant to fine-tune under clean Apache 2.0. A strong specialist tool for embedded work, used within its limits.

API Ergonomics 7.5Tool/Agent Support 6.5Reliability 7

Pros

easy embed, runs on-device, easy fine-tune

Cons

basic vision
unreliable on hard tasks

Right for: embedded narrow tasks

Avoid if: you need dependable hard reasoning

Power User6.5/10

“I meet it through features baked into apps — smart compose, suggestions, light summaries — 'good for free, useful, occasionally wrong,' with instant on-device response.”

End users experience Ministral 3 3B through features baked into apps: smart compose, search suggestions, light summaries. Quality is "good for free, useful, occasionally wrong." Latency is the felt advantage — on-device means instant response with no network round-trip. Vision works on simple images. Refusal rate is low because the model rarely tackles nuanced topics. A capable assistant for narrow, low-stakes tasks; noticeably weaker than the 8B on anything requiring nuance or careful reasoning.

Output Quality 6Speed 8.5Everyday Usefulness 6.5

Pros

instant on-device, useful for narrow tasks

Cons

weak on nuance
basic vision

Right for: low-stakes embedded features

Avoid if: you want reliable, nuanced answers

Skeptic7/10

“AIME 72 on a 3B is genuinely surprising — but GPQA 53 and Arena Hard 30 are the honest ceiling, and 'on-device frontier' would be a stretch too far.”

The published numbers are real and AIME 72.1% on a 3B is genuinely impressive, so this isn't an over-claimed launch. The honest counterweights matter: GPQA Diamond 53.4% and Arena Hard 30.5% show the ceiling plainly — this is a narrow-task model, not a small frontier model, and the AIME headline comes from the reasoning variant at full effort. Vision at 3-4B is basic by physics. The three-variant lineup invites misuse if someone grabs it expecting flagship output. The honest claim — "best on-device multimodal small model with a reasoning option" — holds, provided buyers respect the ceiling and pick the right variant.

Claim Accuracy 8Weakness Severity 6Hype vs Reality 7

Pros

real, surprising small-model numbers
clean license

Cons

clear ceiling
basic vision
full-effort reasoning headline

Right for: buyers who respect the ceiling

Avoid if: you expect more than narrow embedded capability

Strengths

One of the few 3-4B-class models with native vision.
Clean Apache 2.0 — free for any commercial use, including embedding in products.
Runs on phones, single-board computers, and browser WebAssembly/MLX with quantisation.
Strong reasoning/math for the size (AIME 2025 72.1%, MATH 83.0%).
256K context retained despite the small parameter count.
Cost floor of the lineup ($0.10/$0.10) and near-zero marginal cost on-device.

Limitations

Quality ceiling clearly below the 8B sibling on anything non-trivial (GPQA 53.4%, Arena Hard 30.5%).
Vision quality is acceptable for simple tasks only — not complex charts/documents.
Hard reasoning is hit-or-miss even with the reasoning variant.
English conversational quality is noticeably below flagships.
Easy to misuse: developers expecting flagship output will be disappointed.

Best use cases

On-device AI features (mobile apps, browser extensions, MLX on Apple silicon).
Embedded systems where memory and compute are tight.
Free-tier features in consumer apps where API cost must be near zero.
Domain-specific fine-tunes for narrow tasks (classification, intent detection, slot filling).
Background AI in productivity tools (suggest, complete, summarise).

Deep dive

The full research notes behind this review — verified against primary sources.

Architecture Capabilities Benchmark analysis Speed & latency Pricing analysis Deployment & access Safety & privacy Ecosystem & tooling

Architecture

Ministral 3 3B is a 4B-total dense transformer: a 3.4B language model plus a 0.4B vision encoder (per the HF card). Not a MoE. Context is 256K. Three variants — base, instruct, reasoning — all Apache 2.0. Available in FP8/BF16/GGUF/AWQ; runs on ~6GB VRAM quantised, including phones, single-board computers, and browser WebAssembly/MLX. Tokenizer is mistral_common. Layer count, attention type, and training scale are undisclosed. The defining design point is on-device viability: small enough to embed, with vision and a long context retained from its larger siblings.

Capabilities

Ministral 3 3B is purpose-built for on-device and embedded use: mobile apps, browser extensions, IoT, single-board computers, laptop background tasks. Despite the size, native vision is supported, which is rare at 3-4B (cap_vision 4.5). It handles routine summarisation, classification, extraction, simple chat, and lightweight reasoning. The reasoning variant is genuinely capable for the scale: AIME 2025 72.1%, MATH 83.0% (cap_math 7.0, cap_reasoning 6.5) — though GPQA Diamond 53.4% and Arena Hard 30.5% show the ceiling. MMLU 70.7% and Multilingual MMLU 65.2% indicate respectable knowledge and European-language coverage for the size (cap_multilingual 6.5). The 256K context is remarkable at this scale (cap_long_context 7.5). Apache 2.0 enables unrestricted commercial fine-tuning and embedding. No native real-time retrieval (cap_realtime_data 0.0).

Benchmark analysis

Benchmark	Score	vs Predecessor	vs Top Competitor	Source
AIME 2025 (reasoning)	72.1%	new	strong for 3B	HF card
AIME 2024 (reasoning)	77.5%	new	strong for size	HF card
MATH (Maj@1)	83.0%	new	strong for 3B	HF card
GPQA Diamond	53.4%	new	ceiling shows here	HF card
LiveCodeBench	54.8%	new	decent for 3B	HF card
MMLU (5-shot)	70.7%	up	respectable for 3B	HF card

The HF card publishes real reasoning, instruct, and base suites, so coverage is strong for the tier. AIME 72.1% on a 3B is striking; GPQA 53.4% and Arena Hard 30.5% are the honest counterweight showing where the size limits it. Reasoning-variant scores noted.

Speed & latency

Mistral does not publish official tps/TTFT (null), but a 4B dense model is very fast, and the smallest footprint here (~6GB quantised) makes it well-suited to instant on-device inference. Fast latency tier. On-device delivery is itself the latency advantage — no network round-trip — which is much of the model's appeal. The reasoning variant trades speed for accuracy.

Pricing analysis

Surface	Cost	Notes
API input	$0.10 / 1M tok	La Plateforme (cost floor of lineup)
API output	$0.10 / 1M tok	symmetric pricing
Cached input	$0.01 / 1M tok	cache read
Batch (in/out)	$0.05 / $0.05	~50% async discount
Self-host / on-device	Apache 2.0	weights on Hugging Face, ~6GB, phone/SBC viable
Free tier	La Plateforme quota; self-host	no card for self-host
Cloud	Bedrock, Azure AI Foundry	managed

Deployment & access

Apache 2.0 weights on Hugging Face (FP8/BF16/GGUF/AWQ) — clean license, no carve-out. Runs on ~6GB VRAM quantised, including phones, single-board computers, and browser WebAssembly/MLX via llama.cpp/Ollama/LM Studio/MLX. Three variants. Also managed on Bedrock and Azure AI Foundry and via La Plateforme (EU-hosted by default), though the model's reason to exist is on-device. For privacy-strict, EU-residency-strict, or offline products, on-device deployment keeps data entirely on the user's hardware with zero marginal API cost.

Safety & privacy

Standard Mistral posture: GDPR-native, SOC 2 Type II, ISO 27001/27701, EU AI Act aligned, EU residency by default (and full on-device control via self-host), 30-day abuse retention on API, no training on inputs unless opt-in, ZDR available. No built-in moderation; separate Mistral Moderation API. Refusal rate is low in practice because the model rarely tackles nuanced/sensitive topics at this scale.

Ecosystem & tooling

SDKs in Python and TypeScript/JavaScript; runs via vLLM, Ollama, llama.cpp, LM Studio, and MLX (Apple silicon), making it well-suited to on-device and offline deployment. Available through La Plateforme, Bedrock, and Azure AI Foundry. Apache 2.0 weights drive a growing on-device/embedded community across the three variants. Popularity is growing, strongest in mobile/IoT and privacy-first EU products.

Buyer questions

Is the license clean?

Yes — genuine Apache 2.0, no carve-out. Embed in commercial products freely.

What's the context window?

256K (Mistral 3 family standard) — corrects an earlier 131K figure.

Can it really run on a phone?

Yes — ~6GB quantised runs on phones, single-board computers, and browser WebAssembly/MLX.

Which variant?

Instruct for narrow tasks, reasoning for light math/multi-step, base for fine-tuning. Most embedded apps want instruct.

Is the AIME 72 real?

Yes, on the reasoning variant at full effort. But GPQA 53.4% shows the model's ceiling on harder, broader reasoning — don't over-extend it.

Does it have vision?

Yes — a 0.4B encoder, basic but functional for simple images; not for complex documents.

Where does my data live?

Entirely on-device if self-hosted, or EU by default on La Plateforme.

Comparable models

Phi-4 mini:

Comparable size; no native vision, weaker multilingual.

Gemma 3 4B:

Comparable size; weaker EU multilingual and weaker vision.

Llama 4 1B/3B:

Comparable size; weaker/absent vision, similar permissive-style licensing.

Ministral 3 8B (Mistral):

Bigger sibling — meaningfully stronger across the board, same family API, ~1.5x the price.

Sources

Primary references used to verify this review.

Model specs

Input price: $0.10 / Mtok
Output price: $0.10 / Mtok
Cached input: $0.01 / Mtok
Batch (in/out): $0.05 / $0.05
Context window: 256K tokens
Max output: 8K tokens
Knowledge cutoff: 2025-09
Released: 2025-12-01
Modalities: text, image → text
Output speed: Not profiled
License: Open weights (Apache-2.0)
Clouds: Bedrock, Azure AI Foundry

Does not train on API inputs by default

Other Ministral 3 versions

Last verified 2026-05-27