by Mistral AI · Ministral 3 family · best for fast multilingual edge model with vision
Ministral 3 8B (release 25.12, shipped 2 December 2025) is Mistral's mid-edge model: a 9B-total dense transformer (8.4B language model + 0.4B vision encoder) under Apache 2.0, with 256K context and native vision. The reasoning variant posts AIME 2025 78.7%, GPQA Diamond 66.8%, MATH 87.6%, LiveCodeBench 61.6%, and MMLU 76.1% — strong for the tier. Symmetric pricing at $0.15/$0.15. The buyer's sentence: the right default for fast, multilingual, vision-capable edge work on consumer GPUs or laptops, with a clean license. - Provider: Mistral AI (Paris, France) - Release: 2025-12-02, status GA - Context: 256,000 tokens; max output 16,384 - Modalities: text + image in, text out (native multimodal) - Knowledge cutoff: ~September 2025 - Headline price: $0.15 input / $0.15 output per 1M tokens (symmetric) - Architecture: 9B dense (8.4B LM + 0.4B vision); base/instruct/reasoning variants, all Apache 2.0
| Benchmark | Score | Source |
|---|---|---|
| MMLU | 76.1% | huggingface.co 2025-12-02T00:00:00.000Z |
| MATH-500 | 87.6% | huggingface.co 2025-12-02T00:00:00.000Z |
| AIME 2025 | 78.7% | huggingface.co 2025-12-02T00:00:00.000Z |
| GPQA Diamond | 66.8% | huggingface.co 2025-12-02T00:00:00.000Z |
| LiveCodeBench | 61.6% | huggingface.co 2025-12-02T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“When the constraint is 'must run on a laptop or small server,' this is the default — native vision, real reasoning, and clean Apache 2.0 in one 8B.”
For any feature constrained to a laptop or small server, Ministral 3 8B is now the default. The combination of native vision, real reasoning (AIME 78.7%), EU-language quality, and a clean Apache 2.0 license is unusual at this size. I would route premium queries to Medium 3.5 or Small 4 and let the 8B handle the long tail of simpler tasks and on-device features. For embedded EU deployments where data must stay on the customer's hardware, this is the model — full on-prem control, no license fee, modest hardware.
“Native vision plus real reasoning at 8B, fully open — it strengthens Mistral's edge-tier story against Llama and Qwen on the axes EU buyers care about.”
Strategically the 8B reinforces Mistral's edge-tier position. Native vision and credible reasoning at 8B, under clean Apache 2.0, beat Llama 4 8B and Qwen 3 8B on the EU-relevant axes (multilingual quality, vision, license clarity). It is the bridge between the on-device 3B and the more capable 14B, giving product teams a clean ladder. The differentiation is the bundle at the size, not a single benchmark. Market timing aligns with edge/on-device demand and the EU AI Act compliance tailwind; the open license maximises adoption.
“$0.15/$0.15 makes monthly forecasting boring in the best way, and self-host on a consumer GPU caps cost at infrastructure for very high volume.”
Symmetric $0.15/$0.15 pricing makes monthly forecasting trivial — request count is the only variable. Self-host under clean Apache 2.0 on a consumer GPU caps cost at infrastructure for very high volume, with no license fee (unlike Medium 3.5 or the 125B Devstral). For any cost-sensitive workload that doesn't demand flagship quality, this is the right starting point, and the reasoning variant avoids escalating to a pricier model for many queries. Strong unit economics for the embedded-AI and high-volume-chat use cases.
“Fast and 'fine' for routine tasks; the reasoning variant gives me a careful answer without paying for a bigger model, and it runs locally.”
The instruct variant is fast and reliable for routine tasks — summarisation, extraction, classification, simple agent steps. The reasoning variant is useful when I want a careful answer without a bigger model. Vision is usable for screenshot triage. Same API shape as the rest of Mistral, so swapping is trivial, and clean Apache 2.0 makes fine-tuning straightforward. Runs locally via vLLM/Ollama/llama.cpp/LM Studio at a 16GB footprint. Not a model for hard problems, but a strong default for the easy 80% with a reasoning escape hatch.
“Snappy and capable for routine tasks, occasionally rough on nuance — and EU-language quality feels native, not translated.”
Snappy and capable for routine tasks, occasionally rough on nuanced queries. Vision works on simple things. The standout is European-language quality, which feels native rather than translated. Refusal rate is moderate. With reasoning toggled on it takes longer but is noticeably more careful. Conversational warmth is mid — efficient rather than friendly. A solid "free tier" model behind consumer features, especially when on-device delivery gives instant, private responses.
“Honest published numbers and a clean license — the only thing to flag is that 8B reasoning headlines come from the reasoning variant at full effort.”
Like the 14B, this is an honest small-model launch with real published benchmarks across variants, so there's little to debunk. The fair caveats: the AIME 78.7% headline is the reasoning variant at full effort (latency/token cost), the instruct variant most apps use is less spectacular, and an 8B's ceiling on hard, broad tasks is real — it won't stand in for the 14B or Small 4 where capability matters. Vision is genuinely soft at this size. The honest claim — "fast multilingual edge model with vision and a reasoning option" — holds; just pick the right variant and don't over-extend it.
- Mobile or browser-embedded AI features. - Single-GPU on-prem deployments at branch offices or regulated sites. - Domain fine-tuning baseline under Apache 2.0. - Cost-floor chat at very high volume with occasional reasoning. - Embedded vision tasks (OCR-light, screenshot triage).
Yes — genuine Apache 2.0, no revenue carve-out. Fine-tune and self-host freely.
256K (Mistral 3 family standard) — corrects an earlier 131K figure.
Instruct for routine tasks, reasoning for careful multi-step/math answers, base for fine-tuning. Most apps want instruct.
A single 16GB GPU quantised — consumer cards and beefier laptops work.
Yes — a 0.4B encoder, useful for screenshot triage and light OCR, not polished for complex documents.
Cheaper and faster, lower reasoning ceiling; step up to the 14B when hard math/reasoning matters.
EU by default on La Plateforme, or fully on your device via self-host.
Does not train on API inputs by default
Last verified 2026-05-27