by Mistral AI · Ministral 3 family · best for on-device AI with native vision
Ministral 3 3B (release 25.12, shipped 2 December 2025) is the smallest member of the Mistral 3 family, built for on-device and severely resource-constrained deployment: a 4B-total dense transformer (3.4B language model + 0.4B vision encoder) under Apache 2.0, with 256K context and — unusually for the size — native vision. The reasoning variant posts AIME 2025 72.1%, MATH 83.0%, GPQA Diamond 53.4%, and MMLU 70.7%. Symmetric pricing at $0.10/$0.10, the cost floor of the lineup. The buyer's sentence: the right tool for embedding AI in phones, browsers, and IoT, with native vision and a clean license — and the wrong tool for anything that needs flagship nuance. - Provider: Mistral AI (Paris, France) - Release: 2025-12-02, status GA - Context: 256,000 tokens; max output 8,192 - Modalities: text + image in, text out (native multimodal) - Knowledge cutoff: ~September 2025 - Headline price: $0.10 input / $0.10 output per 1M tokens (symmetric) - Architecture: 4B dense (3.4B LM + 0.4B vision); base/instruct/reasoning variants, all Apache 2.0
| Benchmark | Score | Source |
|---|---|---|
| MMLU | 70.7% | huggingface.co 2025-12-02T00:00:00.000Z |
| MATH-500 | 83% | huggingface.co 2025-12-02T00:00:00.000Z |
| AIME 2025 | 72.1% | huggingface.co 2025-12-02T00:00:00.000Z |
| GPQA Diamond | 53.4% | huggingface.co 2025-12-02T00:00:00.000Z |
| LiveCodeBench | 54.8% | huggingface.co 2025-12-02T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“It lets me ship AI features that must run on the customer's device — native vision plus Apache 2.0 means a multimodal feature with no per-request cost or residency headache.”
Ministral 3 3B is the model that lets me ship AI features that have to run on the customer's device. For privacy-sensitive features (mobile keyboards, on-device assistants, EU-residency-strict products), it is the right choice: native vision plus clean Apache 2.0 means I can embed a multimodal feature with no per-request API cost and no data-residency exposure. The ceiling is real, so I would not put it on anything customer-visible that needs reliable nuance — it works best paired with bigger Mistral models for escalation. For embedded, offline, and privacy-first scenarios, strategically valuable.
“Native vision at 3B, fully open, is a genuine differentiator for the on-device tier — it expands where 'AI inside the app' is feasible.”
Strategically, the 3B extends Mistral's reach to the on-device tier where most competitors either lack vision or carry restrictive licenses. Native multimodality at 3-4B under clean Apache 2.0 is a real differentiator that expands the feasible surface for "AI inside the app" — mobile, browser, IoT — particularly for EU products that need data to stay on-device. It completes Mistral's ladder (3B on-device, 8B edge, 14B edge-reasoning) into a coherent story. The differentiation is the on-device multimodal + open combination; the limitation is that, as the smallest model, its strategic value is bounded to narrow embedded tasks.
“$0.10/$0.10 is the floor, but the real win is on-device self-host at zero marginal cost for products shipping to millions.”
$0.10/$0.10 is the cost floor of the lineup, and the real win is on-device self-host at effectively zero marginal cost. For any product shipping AI to millions of users where per-call API spend would dominate, this is the right tier, and clean Apache 2.0 means I amortise integration cost with no licensing surprises (unlike Medium 3.5 or the 125B Devstral). Annual forecasting is trivial. Strong unit economics specifically in the embedded-AI use case; for anything needing more capability, the saving evaporates against the ceiling.
“Straightforward to embed — same API, well-quantised weights, runs on phones with llama.cpp/MLX — and the instruct variant is reliable for narrow tasks.”
For embedded use the developer experience is straightforward: same API shape as the rest of Mistral, well-quantised weights, runs on phones and Apple silicon via llama.cpp/Ollama/LM Studio/MLX. The reasoning variant occasionally surprises with a careful answer to a hard question, but I would not bet a product on it. Vision is basic. The instruct variant for narrow tasks (classification, extraction, slot filling) is reliable enough. Pleasant to fine-tune under clean Apache 2.0. A strong specialist tool for embedded work, used within its limits.
“I meet it through features baked into apps — smart compose, suggestions, light summaries — 'good for free, useful, occasionally wrong,' with instant on-device response.”
End users experience Ministral 3 3B through features baked into apps: smart compose, search suggestions, light summaries. Quality is "good for free, useful, occasionally wrong." Latency is the felt advantage — on-device means instant response with no network round-trip. Vision works on simple images. Refusal rate is low because the model rarely tackles nuanced topics. A capable assistant for narrow, low-stakes tasks; noticeably weaker than the 8B on anything requiring nuance or careful reasoning.
“AIME 72 on a 3B is genuinely surprising — but GPQA 53 and Arena Hard 30 are the honest ceiling, and 'on-device frontier' would be a stretch too far.”
The published numbers are real and AIME 72.1% on a 3B is genuinely impressive, so this isn't an over-claimed launch. The honest counterweights matter: GPQA Diamond 53.4% and Arena Hard 30.5% show the ceiling plainly — this is a narrow-task model, not a small frontier model, and the AIME headline comes from the reasoning variant at full effort. Vision at 3-4B is basic by physics. The three-variant lineup invites misuse if someone grabs it expecting flagship output. The honest claim — "best on-device multimodal small model with a reasoning option" — holds, provided buyers respect the ceiling and pick the right variant.
- On-device AI features (mobile apps, browser extensions, MLX on Apple silicon). - Embedded systems where memory and compute are tight. - Free-tier features in consumer apps where API cost must be near zero. - Domain-specific fine-tunes for narrow tasks (classification, intent detection, slot filling). - Background AI in productivity tools (suggest, complete, summarise).
Yes — genuine Apache 2.0, no carve-out. Embed in commercial products freely.
256K (Mistral 3 family standard) — corrects an earlier 131K figure.
Yes — ~6GB quantised runs on phones, single-board computers, and browser WebAssembly/MLX.
Instruct for narrow tasks, reasoning for light math/multi-step, base for fine-tuning. Most embedded apps want instruct.
Yes, on the reasoning variant at full effort. But GPQA 53.4% shows the model's ceiling on harder, broader reasoning — don't over-extend it.
Yes — a 0.4B encoder, basic but functional for simple images; not for complex documents.
Entirely on-device if self-hosted, or EU by default on La Plateforme.
Does not train on API inputs by default
Last verified 2026-05-27