Ministral 3 8B

GALatest Ministral

by Mistral AI · Ministral 3 family · best for fast multilingual edge model with vision

Edge / On-DeviceOpen-WeightsMultimodalCost-Optimized
7.5
AI Panel Score
Value 9.0/10

Ministral 3 8B (release 25.12, shipped 2 December 2025) is Mistral's mid-edge model: a 9B-total dense transformer (8.4B language model + 0.4B vision encoder) under Apache 2.0, with 256K context and native vision. The reasoning variant posts AIME 2025 78.7%, GPQA Diamond 66.8%, MATH 87.6%, LiveCodeBench 61.6%, and MMLU 76.1% — strong for the tier. Symmetric pricing at $0.15/$0.15. The buyer's sentence: the right default for fast, multilingual, vision-capable edge work on consumer GPUs or laptops, with a clean license. - Provider: Mistral AI (Paris, France) - Release: 2025-12-02, status GA - Context: 256,000 tokens; max output 16,384 - Modalities: text + image in, text out (native multimodal) - Knowledge cutoff: ~September 2025 - Headline price: $0.15 input / $0.15 output per 1M tokens (symmetric) - Architecture: 9B dense (8.4B LM + 0.4B vision); base/instruct/reasoning variants, all Apache 2.0

What's new

  • Native vision input at the 8B tier via a 0.4B encoder — the original Ministral 8B v1 was text-only.
  • Three variants — base, instruct, reasoning — all Apache 2.0.
  • 256K context, up from 32K on v1 (corrects a prior 131K figure — the Mistral 3 family is 256K).
  • Symmetric pricing $0.15/$0.15 for simple cost modelling.
  • Repositioned explicitly as Mistral's edge / mobile tier with real reasoning (AIME 78.7%).

Benchmarks

BenchmarkScoreSource
MMLU76.1%huggingface.co 2025-12-02T00:00:00.000Z
MATH-50087.6%huggingface.co 2025-12-02T00:00:00.000Z
AIME 202578.7%huggingface.co 2025-12-02T00:00:00.000Z
GPQA Diamond66.8%huggingface.co 2025-12-02T00:00:00.000Z
LiveCodeBench61.6%huggingface.co 2025-12-02T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker7.5/10
When the constraint is 'must run on a laptop or small server,' this is the default — native vision, real reasoning, and clean Apache 2.0 in one 8B.

For any feature constrained to a laptop or small server, Ministral 3 8B is now the default. The combination of native vision, real reasoning (AIME 78.7%), EU-language quality, and a clean Apache 2.0 license is unusual at this size. I would route premium queries to Medium 3.5 or Small 4 and let the 8B handle the long tail of simpler tasks and on-device features. For embedded EU deployments where data must stay on the customer's hardware, this is the model — full on-prem control, no license fee, modest hardware.

Strategic Fit 8Vendor Risk 9Roadmap Confidence 7
Pros
  • laptop-viable, native vision, clean license
Cons
  • ceiling-limited
  • soft vision
Right for: on-device/embedded EU features
Avoid if: the workload needs more than an 8B can give
Domain Strategist7.5/10
Native vision plus real reasoning at 8B, fully open — it strengthens Mistral's edge-tier story against Llama and Qwen on the axes EU buyers care about.

Strategically the 8B reinforces Mistral's edge-tier position. Native vision and credible reasoning at 8B, under clean Apache 2.0, beat Llama 4 8B and Qwen 3 8B on the EU-relevant axes (multilingual quality, vision, license clarity). It is the bridge between the on-device 3B and the more capable 14B, giving product teams a clean ladder. The differentiation is the bundle at the size, not a single benchmark. Market timing aligns with edge/on-device demand and the EU AI Act compliance tailwind; the open license maximises adoption.

Competitive Positioning 7.5Differentiation 8Market Timing 7.5
Pros
  • vision+reasoning+open at 8B
  • clean ladder
Cons
  • crowded tier
  • ceiling-limited
Right for: edge-tier product strategies
Avoid if: you need peak capability
Finance Lead9/10
$0.15/$0.15 makes monthly forecasting boring in the best way, and self-host on a consumer GPU caps cost at infrastructure for very high volume.

Symmetric $0.15/$0.15 pricing makes monthly forecasting trivial — request count is the only variable. Self-host under clean Apache 2.0 on a consumer GPU caps cost at infrastructure for very high volume, with no license fee (unlike Medium 3.5 or the 125B Devstral). For any cost-sensitive workload that doesn't demand flagship quality, this is the right starting point, and the reasoning variant avoids escalating to a pricier model for many queries. Strong unit economics for the embedded-AI and high-volume-chat use cases.

Cost Efficiency 9Pricing Transparency 9Value per Dollar 9
Pros
  • symmetric pricing, cheap self-host, no license fee
Cons
  • none material at this tier
Right for: cost-floor and embedded workloads
Avoid if: you need ceiling capability
Domain Practitioner7.5/10
Fast and 'fine' for routine tasks; the reasoning variant gives me a careful answer without paying for a bigger model, and it runs locally.

The instruct variant is fast and reliable for routine tasks — summarisation, extraction, classification, simple agent steps. The reasoning variant is useful when I want a careful answer without a bigger model. Vision is usable for screenshot triage. Same API shape as the rest of Mistral, so swapping is trivial, and clean Apache 2.0 makes fine-tuning straightforward. Runs locally via vLLM/Ollama/llama.cpp/LM Studio at a 16GB footprint. Not a model for hard problems, but a strong default for the easy 80% with a reasoning escape hatch.

API Ergonomics 8Tool/Agent Support 7Reliability 8
Pros
  • fast, easy fine-tune, local, reasoning option
Cons
  • ceiling-limited
  • soft vision
Right for: routine + embedded tasks
Avoid if: you need agentic depth or strong vision
Power User7/10
Snappy and capable for routine tasks, occasionally rough on nuance — and EU-language quality feels native, not translated.

Snappy and capable for routine tasks, occasionally rough on nuanced queries. Vision works on simple things. The standout is European-language quality, which feels native rather than translated. Refusal rate is moderate. With reasoning toggled on it takes longer but is noticeably more careful. Conversational warmth is mid — efficient rather than friendly. A solid "free tier" model behind consumer features, especially when on-device delivery gives instant, private responses.

Output Quality 7Speed 7.5Everyday Usefulness 7
Pros
  • snappy, native EU languages, reasoning option
Cons
  • rough on nuance
  • mid warmth
Right for: consumer free-tier/on-device features
Avoid if: you want flagship nuance
Skeptic7.5/10
Honest published numbers and a clean license — the only thing to flag is that 8B reasoning headlines come from the reasoning variant at full effort.

Like the 14B, this is an honest small-model launch with real published benchmarks across variants, so there's little to debunk. The fair caveats: the AIME 78.7% headline is the reasoning variant at full effort (latency/token cost), the instruct variant most apps use is less spectacular, and an 8B's ceiling on hard, broad tasks is real — it won't stand in for the 14B or Small 4 where capability matters. Vision is genuinely soft at this size. The honest claim — "fast multilingual edge model with vision and a reasoning option" — holds; just pick the right variant and don't over-extend it.

Claim Accuracy 8Weakness Severity 7Hype vs Reality 8
Pros
  • real benchmarks
  • clean license
Cons
  • reasoning headline is full-effort
  • 8B ceiling
  • soft vision
Right for: buyers who pick the right variant
Avoid if: you expect 14B-class breadth

Strengths

  • Native vision at 8B is a structural advantage over many peers in this size class.
  • Strong reasoning for the tier (AIME 2025 78.7%, MATH 87.6%).
  • Clean Apache 2.0 enables free commercial fine-tuning.
  • Excellent EU-language coverage for the size.
  • Runs on consumer GPUs and beefier laptops (16GB quantised).
  • 256K context is rare at 8B; symmetric pricing simplifies forecasting.

Limitations

  • Hard tasks reveal the parameter ceiling vs Ministral 14B or Small 4.
  • Vision quality is "useful," not "polished" — complex documents/charts are tougher.
  • English conversational warmth trails flagships.
  • Three-variant SKU choice can confuse newcomers.
  • Cross-comparable instruct-variant coverage from Mistral is lighter than reasoning.

Best use cases

- Mobile or browser-embedded AI features. - Single-GPU on-prem deployments at branch offices or regulated sites. - Domain fine-tuning baseline under Apache 2.0. - Cost-floor chat at very high volume with occasional reasoning. - Embedded vision tasks (OCR-light, screenshot triage).

Buyer questions

Is the license clean?

Yes — genuine Apache 2.0, no revenue carve-out. Fine-tune and self-host freely.

What's the context window?

256K (Mistral 3 family standard) — corrects an earlier 131K figure.

Which variant?

Instruct for routine tasks, reasoning for careful multi-step/math answers, base for fine-tuning. Most apps want instruct.

What hardware?

A single 16GB GPU quantised — consumer cards and beefier laptops work.

Does it have vision?

Yes — a 0.4B encoder, useful for screenshot triage and light OCR, not polished for complex documents.

How does it compare to the 14B?

Cheaper and faster, lower reasoning ceiling; step up to the 14B when hard math/reasoning matters.

Where does my data live?

EU by default on La Plateforme, or fully on your device via self-host.

Comparable models

**Llama 4 8B:** Comparable size; weaker EU multilingual and weaker/absent native vision.
**Qwen 3 8B:** Comparable size; stronger Chinese, weaker EU-language quality.
**Ministral 3 14B (Mistral):** Bigger sibling — meaningfully stronger reasoning, same family API, ~1.3x the price.
**Phi-4 mini:** Comparable tier; no native vision, weaker multilingual.

Model specs

Input price
$0.15 / Mtok
Output price
$0.15 / Mtok
Cached input
$0.01 / Mtok
Batch (in/out)
$0.07 / $0.07
Context window
256K tokens
Max output
16K tokens
Knowledge cutoff
2025-09
Released
2025-12-01
Modalities
text, image → text
Output speed
Not profiled
License
Open weights (Apache-2.0)
Clouds
Bedrock, Azure AI Foundry

Does not train on API inputs by default

Other Ministral 3 versions

Last verified 2026-05-27