Ministral 3 3B

GALatest Ministral

by Mistral AI · Ministral 3 family · best for on-device AI with native vision

Edge / On-DeviceOpen-WeightsMultimodalCost-Optimized
7.1
AI Panel Score
Value 9.0/10

Ministral 3 3B (release 25.12, shipped 2 December 2025) is the smallest member of the Mistral 3 family, built for on-device and severely resource-constrained deployment: a 4B-total dense transformer (3.4B language model + 0.4B vision encoder) under Apache 2.0, with 256K context and — unusually for the size — native vision. The reasoning variant posts AIME 2025 72.1%, MATH 83.0%, GPQA Diamond 53.4%, and MMLU 70.7%. Symmetric pricing at $0.10/$0.10, the cost floor of the lineup. The buyer's sentence: the right tool for embedding AI in phones, browsers, and IoT, with native vision and a clean license — and the wrong tool for anything that needs flagship nuance. - Provider: Mistral AI (Paris, France) - Release: 2025-12-02, status GA - Context: 256,000 tokens; max output 8,192 - Modalities: text + image in, text out (native multimodal) - Knowledge cutoff: ~September 2025 - Headline price: $0.10 input / $0.10 output per 1M tokens (symmetric) - Architecture: 4B dense (3.4B LM + 0.4B vision); base/instruct/reasoning variants, all Apache 2.0

What's new

  • Smallest Mistral 3 family member, explicitly targeting on-device and severely resource-constrained deployment.
  • Native vision input via a 0.4B encoder — rare at 3-4B scale.
  • Three variants — base, instruct, reasoning — all Apache 2.0.
  • Symmetric pricing $0.10/$0.10 — the cost floor of the lineup.
  • 256K context retained from the larger Ministral 3 siblings (corrects a prior 131K figure — the Mistral 3 family is 256K).

Benchmarks

BenchmarkScoreSource
MMLU70.7%huggingface.co 2025-12-02T00:00:00.000Z
MATH-50083%huggingface.co 2025-12-02T00:00:00.000Z
AIME 202572.1%huggingface.co 2025-12-02T00:00:00.000Z
GPQA Diamond53.4%huggingface.co 2025-12-02T00:00:00.000Z
LiveCodeBench54.8%huggingface.co 2025-12-02T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker7/10
It lets me ship AI features that must run on the customer's device — native vision plus Apache 2.0 means a multimodal feature with no per-request cost or residency headache.

Ministral 3 3B is the model that lets me ship AI features that have to run on the customer's device. For privacy-sensitive features (mobile keyboards, on-device assistants, EU-residency-strict products), it is the right choice: native vision plus clean Apache 2.0 means I can embed a multimodal feature with no per-request API cost and no data-residency exposure. The ceiling is real, so I would not put it on anything customer-visible that needs reliable nuance — it works best paired with bigger Mistral models for escalation. For embedded, offline, and privacy-first scenarios, strategically valuable.

Strategic Fit 7Vendor Risk 9Roadmap Confidence 7
Pros
  • on-device, native vision, clean license, no per-request cost
Cons
  • hard ceiling
  • soft vision
Right for: embedded/offline/privacy-first features
Avoid if: the feature needs reliable nuance
Domain Strategist7/10
Native vision at 3B, fully open, is a genuine differentiator for the on-device tier — it expands where 'AI inside the app' is feasible.

Strategically, the 3B extends Mistral's reach to the on-device tier where most competitors either lack vision or carry restrictive licenses. Native multimodality at 3-4B under clean Apache 2.0 is a real differentiator that expands the feasible surface for "AI inside the app" — mobile, browser, IoT — particularly for EU products that need data to stay on-device. It completes Mistral's ladder (3B on-device, 8B edge, 14B edge-reasoning) into a coherent story. The differentiation is the on-device multimodal + open combination; the limitation is that, as the smallest model, its strategic value is bounded to narrow embedded tasks.

Competitive Positioning 7Differentiation 8Market Timing 7
Pros
  • on-device multimodal + open
  • completes the ladder
Cons
  • bounded to narrow tasks
Right for: on-device product strategies
Avoid if: you need capability beyond embedded tasks
Finance Lead8.5/10
$0.10/$0.10 is the floor, but the real win is on-device self-host at zero marginal cost for products shipping to millions.

$0.10/$0.10 is the cost floor of the lineup, and the real win is on-device self-host at effectively zero marginal cost. For any product shipping AI to millions of users where per-call API spend would dominate, this is the right tier, and clean Apache 2.0 means I amortise integration cost with no licensing surprises (unlike Medium 3.5 or the 125B Devstral). Annual forecasting is trivial. Strong unit economics specifically in the embedded-AI use case; for anything needing more capability, the saving evaporates against the ceiling.

Cost Efficiency 9Pricing Transparency 9Value per Dollar 9
Pros
  • cost floor, zero-marginal on-device, clean license
Cons
  • value bounded by capability ceiling
Right for: mass-scale embedded AI
Avoid if: capability per call matters more than cost
Domain Practitioner7/10
Straightforward to embed — same API, well-quantised weights, runs on phones with llama.cpp/MLX — and the instruct variant is reliable for narrow tasks.

For embedded use the developer experience is straightforward: same API shape as the rest of Mistral, well-quantised weights, runs on phones and Apple silicon via llama.cpp/Ollama/LM Studio/MLX. The reasoning variant occasionally surprises with a careful answer to a hard question, but I would not bet a product on it. Vision is basic. The instruct variant for narrow tasks (classification, extraction, slot filling) is reliable enough. Pleasant to fine-tune under clean Apache 2.0. A strong specialist tool for embedded work, used within its limits.

API Ergonomics 7.5Tool/Agent Support 6.5Reliability 7
Pros
  • easy embed, runs on-device, easy fine-tune
Cons
  • basic vision
  • unreliable on hard tasks
Right for: embedded narrow tasks
Avoid if: you need dependable hard reasoning
Power User6.5/10
I meet it through features baked into apps — smart compose, suggestions, light summaries — 'good for free, useful, occasionally wrong,' with instant on-device response.

End users experience Ministral 3 3B through features baked into apps: smart compose, search suggestions, light summaries. Quality is "good for free, useful, occasionally wrong." Latency is the felt advantage — on-device means instant response with no network round-trip. Vision works on simple images. Refusal rate is low because the model rarely tackles nuanced topics. A capable assistant for narrow, low-stakes tasks; noticeably weaker than the 8B on anything requiring nuance or careful reasoning.

Output Quality 6Speed 8.5Everyday Usefulness 6.5
Pros
  • instant on-device, useful for narrow tasks
Cons
  • weak on nuance
  • basic vision
Right for: low-stakes embedded features
Avoid if: you want reliable, nuanced answers
Skeptic7/10
AIME 72 on a 3B is genuinely surprising — but GPQA 53 and Arena Hard 30 are the honest ceiling, and 'on-device frontier' would be a stretch too far.

The published numbers are real and AIME 72.1% on a 3B is genuinely impressive, so this isn't an over-claimed launch. The honest counterweights matter: GPQA Diamond 53.4% and Arena Hard 30.5% show the ceiling plainly — this is a narrow-task model, not a small frontier model, and the AIME headline comes from the reasoning variant at full effort. Vision at 3-4B is basic by physics. The three-variant lineup invites misuse if someone grabs it expecting flagship output. The honest claim — "best on-device multimodal small model with a reasoning option" — holds, provided buyers respect the ceiling and pick the right variant.

Claim Accuracy 8Weakness Severity 6Hype vs Reality 7
Pros
  • real, surprising small-model numbers
  • clean license
Cons
  • clear ceiling
  • basic vision
  • full-effort reasoning headline
Right for: buyers who respect the ceiling
Avoid if: you expect more than narrow embedded capability

Strengths

  • One of the few 3-4B-class models with native vision.
  • Clean Apache 2.0 — free for any commercial use, including embedding in products.
  • Runs on phones, single-board computers, and browser WebAssembly/MLX with quantisation.
  • Strong reasoning/math for the size (AIME 2025 72.1%, MATH 83.0%).
  • 256K context retained despite the small parameter count.
  • Cost floor of the lineup ($0.10/$0.10) and near-zero marginal cost on-device.

Limitations

  • Quality ceiling clearly below the 8B sibling on anything non-trivial (GPQA 53.4%, Arena Hard 30.5%).
  • Vision quality is acceptable for simple tasks only — not complex charts/documents.
  • Hard reasoning is hit-or-miss even with the reasoning variant.
  • English conversational quality is noticeably below flagships.
  • Easy to misuse: developers expecting flagship output will be disappointed.

Best use cases

- On-device AI features (mobile apps, browser extensions, MLX on Apple silicon). - Embedded systems where memory and compute are tight. - Free-tier features in consumer apps where API cost must be near zero. - Domain-specific fine-tunes for narrow tasks (classification, intent detection, slot filling). - Background AI in productivity tools (suggest, complete, summarise).

Buyer questions

Is the license clean?

Yes — genuine Apache 2.0, no carve-out. Embed in commercial products freely.

What's the context window?

256K (Mistral 3 family standard) — corrects an earlier 131K figure.

Can it really run on a phone?

Yes — ~6GB quantised runs on phones, single-board computers, and browser WebAssembly/MLX.

Which variant?

Instruct for narrow tasks, reasoning for light math/multi-step, base for fine-tuning. Most embedded apps want instruct.

Is the AIME 72 real?

Yes, on the reasoning variant at full effort. But GPQA 53.4% shows the model's ceiling on harder, broader reasoning — don't over-extend it.

Does it have vision?

Yes — a 0.4B encoder, basic but functional for simple images; not for complex documents.

Where does my data live?

Entirely on-device if self-hosted, or EU by default on La Plateforme.

Comparable models

**Phi-4 mini:** Comparable size; no native vision, weaker multilingual.
**Gemma 3 4B:** Comparable size; weaker EU multilingual and weaker vision.
**Llama 4 1B/3B:** Comparable size; weaker/absent vision, similar permissive-style licensing.
**Ministral 3 8B (Mistral):** Bigger sibling — meaningfully stronger across the board, same family API, ~1.5x the price.

Model specs

Input price
$0.10 / Mtok
Output price
$0.10 / Mtok
Cached input
$0.01 / Mtok
Batch (in/out)
$0.05 / $0.05
Context window
256K tokens
Max output
8K tokens
Knowledge cutoff
2025-09
Released
2025-12-01
Modalities
text, image → text
Output speed
Not profiled
License
Open weights (Apache-2.0)
Clouds
Bedrock, Azure AI Foundry

Does not train on API inputs by default

Other Ministral 3 versions

Last verified 2026-05-27