Ministral 3 14B

GALatest Ministral

by Mistral AI · Ministral 3 family · best for best small open model for on-device reasoning

Edge / On-DeviceOpen-WeightsReasoningMultimodalCost-Optimized

7.9

AI Panel Score

Value 9.0/10

Ministral 3 14B (release 25.12, shipped 2 December 2025) is the top of Mistral's edge tier and the strongest small open model in the lineup: a 14B dense transformer (13.5B language model + 0.4B vision encoder) under Apache 2.0, with a 256K context and native vision. Its reasoning variant posts a remarkable AIME 2025 85.0% — best-in-class for sub-20B — alongside GPQA Diamond 71.2%, MATH 90.4%, LiveCodeBench 64.6%, and MMLU 79.4%. Symmetric pricing at $0.20/$0.20. The buyer's sentence: the best cost-to-performance small open model for single-GPU and on-device reasoning, with native vision rare at this size.

Compare this model All Ministral 3 versions

What's new

Three variants — base, instruct, reasoning — all Apache 2.0.
Native vision input via a 0.4B encoder; the prior Ministral 8B v1 was text-only.
Reasoning variant achieves AIME 2025 85.0% — best-in-class for sub-20B and within striking distance of much larger reasoning models.
256K context, up from 32K on the v1 Ministral line (corrects a prior 131K figure — the Mistral 3 family is 256K).
Symmetric pricing ($0.20 in / $0.20 out) simplifies cost modelling for high-volume workloads.

Benchmarks

Benchmark	Score	Source
MMLU	79.4%	huggingface.co 2025-12-02T00:00:00.000Z
MATH-500	90.4%	huggingface.co 2025-12-02T00:00:00.000Z
AIME 2025	85%	huggingface.co 2025-12-02T00:00:00.000Z
GPQA Diamond	71.2%	huggingface.co 2025-12-02T00:00:00.000Z
LiveCodeBench	64.6%	huggingface.co 2025-12-02T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker8/10

“85% AIME on a 14B that runs on one GPU under Apache 2.0 — it changes my routing strategy and my data-residency posture at once.”

Ministral 3 14B is my default for any feature that must run on a single GPU per region or on a customer's premises. Native vision plus 256K context is a strong package at this size, and the reasoning variant changes routing: I can do AIME-class math on a 14B rather than calling Magistral, saving real cost. Clean Apache 2.0 means I fine-tune on private data without license concerns and keep data fully on-prem — a sharper sovereignty story than the API-only or modified-MIT models. For EU edge-deployment scenarios, the obvious starting point.

Strategic Fit 8Vendor Risk 9Roadmap Confidence 8

Pros

single-GPU reasoning, native vision, clean Apache 2.0

Cons

lower ceiling than Small 4/Medium 3.5
SKU sprawl

Right for: EU edge/on-prem with reasoning

Avoid if: you need flagship-class quality

Domain Strategist8/10

“It resets expectations for what a 14B delivers — best-in-class small-model reasoning plus vision, fully open, is a genuine category-mover.”

Strategically, Ministral 3 14B pressures the entire sub-20B open tier. Best-in-class small-model math (AIME 85), native vision, 256K context, and clean Apache 2.0 form a combination Qwen and Llama don't match across all axes. It is the model that makes "on-device frontier-ish reasoning" credible, which matters for privacy-strict EU products and embedded use cases. The differentiation is the bundle, not any single axis. Market timing is good, riding the 2026 push toward edge and cost-controlled inference, and the open license maximises adoption surface.

Competitive Positioning 8Differentiation 8Market Timing 8

Pros

category-moving small-model reasoning
clean license

Cons

crowded tier
ceiling-limited

Right for: edge/on-device strategies

Avoid if: peak capability is the requirement

Finance Lead9/10

“Symmetric $0.20/$0.20 means I forecast by request count alone, and the reasoning variant lets me skip a pricier model for many AIME-style queries.”

Symmetric pricing is finance-friendly: I forecast monthly spend by request count without modelling input/output ratios. The reasoning variant avoids escalating to a more expensive model for many AIME-style queries, which compounds the saving. Self-host under clean Apache 2.0 on one GPU per region converts opex to capex with no license fee — unlike Medium 3.5 or the 125B Devstral. Best price-per-intelligence in the small-multimodal-with-reasoning tier. For high-volume workloads with occasional hard queries, excellent unit economics.

Cost Efficiency 9Pricing Transparency 9Value per Dollar 9

Pros

symmetric pricing, reasoning avoids escalation, clean self-host

Cons

none material at this tier

Right for: high-volume with occasional reasoning

Avoid if: you need ceiling capability regardless of cost

Domain Practitioner8/10

“The reasoning variant is the surprise — 85% AIME on a 14B is genuinely useful for embedded reasoning steps — and fine-tuning on Apache 2.0 weights is painless.”

The instruct variant is the boring-good workhorse for sub-flagship workloads; the reasoning variant is the surprise, with 85% AIME useful for embedded reasoning steps inside a pipeline. Same API shape as the rest of Mistral, so swapping models is painless, and fine-tuning on clean Apache 2.0 weights is straightforward. Vision is "fine for the size" — good enough for OCR-light tasks. Runs locally via vLLM/Ollama/llama.cpp/LM Studio. Good developer experience for the size class; the main friction is choosing among three variants.

API Ergonomics 8Tool/Agent Support 7Reliability 8

Pros

useful embedded reasoning, easy fine-tune, runs anywhere

Cons

variant choice
soft vision

Right for: embedded reasoning and fine-tunes

Avoid if: you need flagship vision/agentic depth

Power User7/10

“Surprisingly capable for a 14B — the reasoning variant is noticeably more careful, the instruct variant is snappy, and EU languages feel native.”

Surprisingly capable for the size. In casual chat the reasoning variant takes longer but produces noticeably more careful answers; the instruct variant is snappy. European-language quality is strong and feels native rather than translated. Vision works on simple images and charts. Conversational warmth is below the flagships — efficient and capable rather than friendly. For an end-user-facing product this is the "free tier that still actually works," especially when it can run on-device for instant, private responses.

Output Quality 7Speed 7.5Everyday Usefulness 7

Pros

capable for size, careful reasoning, native EU languages

Cons

not warm
soft vision

Right for: capable free-tier/on-device features

Avoid if: you want flagship polish

Skeptic7.5/10

“The AIME 85 is real and impressive — just remember it's the reasoning variant at full effort, and the ceiling is still a 14B's ceiling.”

This is one of the more honest small-model launches — the HF card publishes real reasoning, instruct, and base suites, so there's little to debunk. The fair caveats: the headline AIME 85 is the reasoning variant at full effort (more latency and tokens), the instruct variant most apps will use is less spectacular, and a 14B's ceiling on genuinely hard, broad tasks is real — it won't replace Small 4 or Medium 3.5 where capability matters. The three-variant lineup also invites misuse if someone grabs base or instruct expecting reasoning numbers. The honest claim — "best small open model for on-device reasoning" — largely holds.

Claim Accuracy 8Weakness Severity 7Hype vs Reality 8

Pros

real published benchmarks
clean license

Cons

headline is the reasoning variant
14B ceiling

Right for: buyers who pick the right variant

Avoid if: you expect flagship breadth from 14B

Strengths

Reasoning variant has best-in-class small-model math (AIME 2025 85.0%).
Native vision at the 14B tier is rare.
Clean Apache 2.0 makes commercial fine-tuning and on-prem trivial.
256K context is generous for the size class.
Symmetric $0.20/$0.20 pricing simplifies cost modelling.
Runs on a single 24GB GPU.

Limitations

Lower ceiling than Medium 3.5 or Small 4 on the hardest tasks.
Three variants add SKU-selection complexity (most apps want instruct only).
Vision is good for the size but not flagship-class.
The sub-30B class is crowded (Qwen 3 14B, Llama 4 14B).
Mistral published fewer cross-comparable benchmarks for the instruct variant than for reasoning.

Best use cases

Edge / on-prem deployment with a single-GPU constraint.
Laptop-local or branch-office agent applications needing reasoning.
Fine-tuning targets for domain-specific apps (legal, medical, code) under Apache 2.0.
Cost-floor inference for high-volume chat with occasional reasoning.
Embedded reasoning steps inside larger agent pipelines (route AIME-class math here instead of Magistral).

Deep dive

The full research notes behind this review — verified against primary sources.

Architecture Capabilities Benchmark analysis Speed & latency Pricing analysis Deployment & access Safety & privacy Ecosystem & tooling

Architecture

Ministral 3 14B is a 14B dense transformer: a 13.5B language model plus a 0.4B vision encoder (per the HF card). It is not a Mixture-of-Experts, unlike Large 3 and Small 4. Context is 256K. It ships in three variants — base (for fine-tuning), instruct (general chat), and reasoning (extended chain-of-thought) — all under Apache 2.0. Available in FP8/BF16/GGUF/AWQ; runs on a single 24GB GPU (one H200 at FP8, or consumer 24GB cards quantised). Tokenizer is mistral_common. Layer count, attention type, and training scale are undisclosed.

Capabilities

Ministral 3 14B is the best cost-to-performance edge model in the lineup, and the reasoning variant is the headline. AIME 2025 85.0% beats Qwen-14B (73.7%) by ~11pp and lands near reasoning models 5-10x its size (cap_math 8.5, cap_reasoning 8.0). GPQA Diamond 71.2%, MATH 90.4%, and LiveCodeBench 64.6% confirm strong reasoning and decent coding for the size (cap_coding 6.5). MMLU 79.4% and Multilingual MMLU 74.2% show solid knowledge and European-language strength (cap_multilingual 8.0). Native vision via the 0.4B encoder handles screenshots, charts, and document images on commodity hardware (cap_vision 6.0). The 256K context is generous for the class (cap_long_context 8.0). Apache 2.0 enables unrestricted commercial fine-tuning. No native real-time retrieval (cap_realtime_data 0.0).

Benchmark analysis

Benchmark	Score	vs Predecessor	vs Top Competitor	Source
AIME 2025 (reasoning)	85.0%	new	beats Qwen-14B (~73.7%) by ~11pp	HF card
AIME 2024 (reasoning)	89.8%	new	strong for size	HF card
GPQA Diamond	71.2%	new	strong for sub-20B	HF card
MATH (Maj@1)	90.4%	new	top of class	HF card
LiveCodeBench	64.6%	new	strong for 14B	HF card
MMLU (5-shot)	79.4%	up	strong for 14B	HF card
Multilingual MMLU	74.2%	up	best small open EU model	HF card

Ministral 3 14B has the best benchmark coverage in this Mistral set — the HF card publishes reasoning, instruct, and base suites with real numbers. Coverage here is strong; the reasoning-variant scores are recorded with the reasoning variant noted.

Speed & latency

Mistral does not publish official tps/TTFT (null), but a 14B dense model on a single 24GB GPU is fast — well into the fast latency tier for the instruct variant. The reasoning variant trades speed for accuracy: visible chain-of-thought increases latency and output token count. On commodity hardware (single A100/H200 or consumer 24GB cards) the instruct variant is snappy enough for interactive use.

Pricing analysis

Surface	Cost	Notes
API input	$0.20 / 1M tok	La Plateforme
API output	$0.20 / 1M tok	symmetric pricing
Cached input	$0.02 / 1M tok	cache read
Batch (in/out)	$0.10 / $0.10	~50% async discount
Self-host	Apache 2.0	weights on Hugging Face, single 24GB GPU
Free tier	La Plateforme quota; self-host	no card for self-host
Cloud	Bedrock, Azure AI Foundry	managed

Deployment & access

Apache 2.0 weights on Hugging Face (FP8/BF16/GGUF/AWQ) — clean license, no carve-out. Runs on a single 24GB GPU (one H200 at FP8 or consumer 24GB cards quantised). Three variants (base/instruct/reasoning) cover fine-tuning, chat, and reasoning. Managed on Bedrock and Azure AI Foundry; La Plateforme EU-hosted by default. For EU edge-deployment scenarios needing reasoning and vision on a single GPU per region or on-premises, this is the obvious starting point.

Safety & privacy

Standard Mistral posture: GDPR-native, SOC 2 Type II, ISO 27001/27701, EU AI Act aligned, EU residency by default (and full on-prem control via self-host), 30-day abuse retention on API, no training on inputs unless opt-in, ZDR available. No built-in moderation; separate Mistral Moderation API. Moderate refusal calibration.

Ecosystem & tooling

SDKs in Python and TypeScript/JavaScript; runs via vLLM, Ollama, llama.cpp, and LM Studio, and integrates with LangChain. Available through La Plateforme, Bedrock, and Azure AI Foundry. Apache 2.0 weights drive a growing edge/fine-tune community across the three variants. Popularity is growing, strongest in edge and on-device EU deployments.

Buyer questions

Is the license clean?

Yes — genuine Apache 2.0 on the HF card, no revenue carve-out. Fine-tune and self-host freely.

What's the context window?

256K (the Mistral 3 family standard) — corrects an earlier 131K figure.

Which variant do I use?

Instruct for general chat, reasoning for AIME-class math and careful multi-step answers, base for fine-tuning. Most apps want instruct.

Can it really do AIME 85?

Yes, on the reasoning variant at full effort — budget for extra latency and output tokens on hard queries.

What hardware?

A single 24GB GPU (one H200 at FP8, or consumer 24GB cards quantised).

Does it have vision?

Yes — a 0.4B encoder, good for screenshots/charts/light OCR, not flagship-class.

Where does my data live?

EU by default on La Plateforme, or fully on your hardware via self-host.

Comparable models

Qwen 3 14B:

Comparable size; weaker AIME (~73.7%), stronger Chinese, similar permissive licensing.

Llama 4 14B:

Comparable size; weaker EU multilingual and weaker/absent native vision.

Small 4: — Mistral

Bigger (119B MoE, 6.5B active); much stronger on the hardest tasks at a similar API price, same family API.

Ministral 3 8B (Mistral):

Smaller sibling — cheaper and faster, lower reasoning ceiling.

Sources

Primary references used to verify this review.

Model specs

Input price: $0.20 / Mtok
Output price: $0.20 / Mtok
Cached input: $0.02 / Mtok
Batch (in/out): $0.10 / $0.10
Context window: 256K tokens
Max output: 16K tokens
Knowledge cutoff: 2025-09
Released: 2025-12-01
Modalities: text, image → text
Output speed: Not profiled
License: Open weights (Apache-2.0)
Clouds: Bedrock, Azure AI Foundry

Does not train on API inputs by default

Other Ministral 3 versions

Last verified 2026-05-27