Ministral 3 14B

GALatest Ministral

by Mistral AI · Ministral 3 family · best for best small open model for on-device reasoning

Edge / On-DeviceOpen-WeightsReasoningMultimodalCost-Optimized
7.9
AI Panel Score
Value 9.0/10

Ministral 3 14B (release 25.12, shipped 2 December 2025) is the top of Mistral's edge tier and the strongest small open model in the lineup: a 14B dense transformer (13.5B language model + 0.4B vision encoder) under Apache 2.0, with a 256K context and native vision. Its reasoning variant posts a remarkable AIME 2025 85.0% — best-in-class for sub-20B — alongside GPQA Diamond 71.2%, MATH 90.4%, LiveCodeBench 64.6%, and MMLU 79.4%. Symmetric pricing at $0.20/$0.20. The buyer's sentence: the best cost-to-performance small open model for single-GPU and on-device reasoning, with native vision rare at this size. - Provider: Mistral AI (Paris, France) - Release: 2025-12-02, status GA - Context: 256,000 tokens; max output 16,384 - Modalities: text + image in, text out (native multimodal) - Knowledge cutoff: ~September 2025 - Headline price: $0.20 input / $0.20 output per 1M tokens (symmetric) - Architecture: 14B dense (13.5B LM + 0.4B vision); base/instruct/reasoning variants, all Apache 2.0

What's new

  • Three variants — base, instruct, reasoning — all Apache 2.0.
  • Native vision input via a 0.4B encoder; the prior Ministral 8B v1 was text-only.
  • Reasoning variant achieves AIME 2025 85.0% — best-in-class for sub-20B and within striking distance of much larger reasoning models.
  • 256K context, up from 32K on the v1 Ministral line (corrects a prior 131K figure — the Mistral 3 family is 256K).
  • Symmetric pricing ($0.20 in / $0.20 out) simplifies cost modelling for high-volume workloads.

Benchmarks

BenchmarkScoreSource
MMLU79.4%huggingface.co 2025-12-02T00:00:00.000Z
MATH-50090.4%huggingface.co 2025-12-02T00:00:00.000Z
AIME 202585%huggingface.co 2025-12-02T00:00:00.000Z
GPQA Diamond71.2%huggingface.co 2025-12-02T00:00:00.000Z
LiveCodeBench64.6%huggingface.co 2025-12-02T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker8/10
85% AIME on a 14B that runs on one GPU under Apache 2.0 — it changes my routing strategy and my data-residency posture at once.

Ministral 3 14B is my default for any feature that must run on a single GPU per region or on a customer's premises. Native vision plus 256K context is a strong package at this size, and the reasoning variant changes routing: I can do AIME-class math on a 14B rather than calling Magistral, saving real cost. Clean Apache 2.0 means I fine-tune on private data without license concerns and keep data fully on-prem — a sharper sovereignty story than the API-only or modified-MIT models. For EU edge-deployment scenarios, the obvious starting point.

Strategic Fit 8Vendor Risk 9Roadmap Confidence 8
Pros
  • single-GPU reasoning, native vision, clean Apache 2.0
Cons
  • lower ceiling than Small 4/Medium 3.5
  • SKU sprawl
Right for: EU edge/on-prem with reasoning
Avoid if: you need flagship-class quality
Domain Strategist8/10
It resets expectations for what a 14B delivers — best-in-class small-model reasoning plus vision, fully open, is a genuine category-mover.

Strategically, Ministral 3 14B pressures the entire sub-20B open tier. Best-in-class small-model math (AIME 85), native vision, 256K context, and clean Apache 2.0 form a combination Qwen and Llama don't match across all axes. It is the model that makes "on-device frontier-ish reasoning" credible, which matters for privacy-strict EU products and embedded use cases. The differentiation is the bundle, not any single axis. Market timing is good, riding the 2026 push toward edge and cost-controlled inference, and the open license maximises adoption surface.

Competitive Positioning 8Differentiation 8Market Timing 8
Pros
  • category-moving small-model reasoning
  • clean license
Cons
  • crowded tier
  • ceiling-limited
Right for: edge/on-device strategies
Avoid if: peak capability is the requirement
Finance Lead9/10
Symmetric $0.20/$0.20 means I forecast by request count alone, and the reasoning variant lets me skip a pricier model for many AIME-style queries.

Symmetric pricing is finance-friendly: I forecast monthly spend by request count without modelling input/output ratios. The reasoning variant avoids escalating to a more expensive model for many AIME-style queries, which compounds the saving. Self-host under clean Apache 2.0 on one GPU per region converts opex to capex with no license fee — unlike Medium 3.5 or the 125B Devstral. Best price-per-intelligence in the small-multimodal-with-reasoning tier. For high-volume workloads with occasional hard queries, excellent unit economics.

Cost Efficiency 9Pricing Transparency 9Value per Dollar 9
Pros
  • symmetric pricing, reasoning avoids escalation, clean self-host
Cons
  • none material at this tier
Right for: high-volume with occasional reasoning
Avoid if: you need ceiling capability regardless of cost
Domain Practitioner8/10
The reasoning variant is the surprise — 85% AIME on a 14B is genuinely useful for embedded reasoning steps — and fine-tuning on Apache 2.0 weights is painless.

The instruct variant is the boring-good workhorse for sub-flagship workloads; the reasoning variant is the surprise, with 85% AIME useful for embedded reasoning steps inside a pipeline. Same API shape as the rest of Mistral, so swapping models is painless, and fine-tuning on clean Apache 2.0 weights is straightforward. Vision is "fine for the size" — good enough for OCR-light tasks. Runs locally via vLLM/Ollama/llama.cpp/LM Studio. Good developer experience for the size class; the main friction is choosing among three variants.

API Ergonomics 8Tool/Agent Support 7Reliability 8
Pros
  • useful embedded reasoning, easy fine-tune, runs anywhere
Cons
  • variant choice
  • soft vision
Right for: embedded reasoning and fine-tunes
Avoid if: you need flagship vision/agentic depth
Power User7/10
Surprisingly capable for a 14B — the reasoning variant is noticeably more careful, the instruct variant is snappy, and EU languages feel native.

Surprisingly capable for the size. In casual chat the reasoning variant takes longer but produces noticeably more careful answers; the instruct variant is snappy. European-language quality is strong and feels native rather than translated. Vision works on simple images and charts. Conversational warmth is below the flagships — efficient and capable rather than friendly. For an end-user-facing product this is the "free tier that still actually works," especially when it can run on-device for instant, private responses.

Output Quality 7Speed 7.5Everyday Usefulness 7
Pros
  • capable for size, careful reasoning, native EU languages
Cons
  • not warm
  • soft vision
Right for: capable free-tier/on-device features
Avoid if: you want flagship polish
Skeptic7.5/10
The AIME 85 is real and impressive — just remember it's the reasoning variant at full effort, and the ceiling is still a 14B's ceiling.

This is one of the more honest small-model launches — the HF card publishes real reasoning, instruct, and base suites, so there's little to debunk. The fair caveats: the headline AIME 85 is the reasoning variant at full effort (more latency and tokens), the instruct variant most apps will use is less spectacular, and a 14B's ceiling on genuinely hard, broad tasks is real — it won't replace Small 4 or Medium 3.5 where capability matters. The three-variant lineup also invites misuse if someone grabs base or instruct expecting reasoning numbers. The honest claim — "best small open model for on-device reasoning" — largely holds.

Claim Accuracy 8Weakness Severity 7Hype vs Reality 8
Pros
  • real published benchmarks
  • clean license
Cons
  • headline is the reasoning variant
  • 14B ceiling
Right for: buyers who pick the right variant
Avoid if: you expect flagship breadth from 14B

Strengths

  • Reasoning variant has best-in-class small-model math (AIME 2025 85.0%).
  • Native vision at the 14B tier is rare.
  • Clean Apache 2.0 makes commercial fine-tuning and on-prem trivial.
  • 256K context is generous for the size class.
  • Symmetric $0.20/$0.20 pricing simplifies cost modelling.
  • Runs on a single 24GB GPU.

Limitations

  • Lower ceiling than Medium 3.5 or Small 4 on the hardest tasks.
  • Three variants add SKU-selection complexity (most apps want instruct only).
  • Vision is good for the size but not flagship-class.
  • The sub-30B class is crowded (Qwen 3 14B, Llama 4 14B).
  • Mistral published fewer cross-comparable benchmarks for the instruct variant than for reasoning.

Best use cases

- Edge / on-prem deployment with a single-GPU constraint. - Laptop-local or branch-office agent applications needing reasoning. - Fine-tuning targets for domain-specific apps (legal, medical, code) under Apache 2.0. - Cost-floor inference for high-volume chat with occasional reasoning. - Embedded reasoning steps inside larger agent pipelines (route AIME-class math here instead of Magistral).

Buyer questions

Is the license clean?

Yes — genuine Apache 2.0 on the HF card, no revenue carve-out. Fine-tune and self-host freely.

What's the context window?

256K (the Mistral 3 family standard) — corrects an earlier 131K figure.

Which variant do I use?

Instruct for general chat, reasoning for AIME-class math and careful multi-step answers, base for fine-tuning. Most apps want instruct.

Can it really do AIME 85?

Yes, on the reasoning variant at full effort — budget for extra latency and output tokens on hard queries.

What hardware?

A single 24GB GPU (one H200 at FP8, or consumer 24GB cards quantised).

Does it have vision?

Yes — a 0.4B encoder, good for screenshots/charts/light OCR, not flagship-class.

Where does my data live?

EU by default on La Plateforme, or fully on your hardware via self-host.

Comparable models

**Qwen 3 14B:** Comparable size; weaker AIME (~73.7%), stronger Chinese, similar permissive licensing.
**Llama 4 14B:** Comparable size; weaker EU multilingual and weaker/absent native vision.
**Mistral Small 4:** Bigger (119B MoE, 6.5B active); much stronger on the hardest tasks at a similar API price, same family API.
**Ministral 3 8B (Mistral):** Smaller sibling — cheaper and faster, lower reasoning ceiling.

Model specs

Input price
$0.20 / Mtok
Output price
$0.20 / Mtok
Cached input
$0.02 / Mtok
Batch (in/out)
$0.10 / $0.10
Context window
256K tokens
Max output
16K tokens
Knowledge cutoff
2025-09
Released
2025-12-01
Modalities
text, image → text
Output speed
Not profiled
License
Open weights (Apache-2.0)
Clouds
Bedrock, Azure AI Foundry

Does not train on API inputs by default

Other Ministral 3 versions

Last verified 2026-05-27