by Mistral AI · Ministral 3 family · best for best small open model for on-device reasoning
Ministral 3 14B (release 25.12, shipped 2 December 2025) is the top of Mistral's edge tier and the strongest small open model in the lineup: a 14B dense transformer (13.5B language model + 0.4B vision encoder) under Apache 2.0, with a 256K context and native vision. Its reasoning variant posts a remarkable AIME 2025 85.0% — best-in-class for sub-20B — alongside GPQA Diamond 71.2%, MATH 90.4%, LiveCodeBench 64.6%, and MMLU 79.4%. Symmetric pricing at $0.20/$0.20. The buyer's sentence: the best cost-to-performance small open model for single-GPU and on-device reasoning, with native vision rare at this size. - Provider: Mistral AI (Paris, France) - Release: 2025-12-02, status GA - Context: 256,000 tokens; max output 16,384 - Modalities: text + image in, text out (native multimodal) - Knowledge cutoff: ~September 2025 - Headline price: $0.20 input / $0.20 output per 1M tokens (symmetric) - Architecture: 14B dense (13.5B LM + 0.4B vision); base/instruct/reasoning variants, all Apache 2.0
| Benchmark | Score | Source |
|---|---|---|
| MMLU | 79.4% | huggingface.co 2025-12-02T00:00:00.000Z |
| MATH-500 | 90.4% | huggingface.co 2025-12-02T00:00:00.000Z |
| AIME 2025 | 85% | huggingface.co 2025-12-02T00:00:00.000Z |
| GPQA Diamond | 71.2% | huggingface.co 2025-12-02T00:00:00.000Z |
| LiveCodeBench | 64.6% | huggingface.co 2025-12-02T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“85% AIME on a 14B that runs on one GPU under Apache 2.0 — it changes my routing strategy and my data-residency posture at once.”
Ministral 3 14B is my default for any feature that must run on a single GPU per region or on a customer's premises. Native vision plus 256K context is a strong package at this size, and the reasoning variant changes routing: I can do AIME-class math on a 14B rather than calling Magistral, saving real cost. Clean Apache 2.0 means I fine-tune on private data without license concerns and keep data fully on-prem — a sharper sovereignty story than the API-only or modified-MIT models. For EU edge-deployment scenarios, the obvious starting point.
“It resets expectations for what a 14B delivers — best-in-class small-model reasoning plus vision, fully open, is a genuine category-mover.”
Strategically, Ministral 3 14B pressures the entire sub-20B open tier. Best-in-class small-model math (AIME 85), native vision, 256K context, and clean Apache 2.0 form a combination Qwen and Llama don't match across all axes. It is the model that makes "on-device frontier-ish reasoning" credible, which matters for privacy-strict EU products and embedded use cases. The differentiation is the bundle, not any single axis. Market timing is good, riding the 2026 push toward edge and cost-controlled inference, and the open license maximises adoption surface.
“Symmetric $0.20/$0.20 means I forecast by request count alone, and the reasoning variant lets me skip a pricier model for many AIME-style queries.”
Symmetric pricing is finance-friendly: I forecast monthly spend by request count without modelling input/output ratios. The reasoning variant avoids escalating to a more expensive model for many AIME-style queries, which compounds the saving. Self-host under clean Apache 2.0 on one GPU per region converts opex to capex with no license fee — unlike Medium 3.5 or the 125B Devstral. Best price-per-intelligence in the small-multimodal-with-reasoning tier. For high-volume workloads with occasional hard queries, excellent unit economics.
“The reasoning variant is the surprise — 85% AIME on a 14B is genuinely useful for embedded reasoning steps — and fine-tuning on Apache 2.0 weights is painless.”
The instruct variant is the boring-good workhorse for sub-flagship workloads; the reasoning variant is the surprise, with 85% AIME useful for embedded reasoning steps inside a pipeline. Same API shape as the rest of Mistral, so swapping models is painless, and fine-tuning on clean Apache 2.0 weights is straightforward. Vision is "fine for the size" — good enough for OCR-light tasks. Runs locally via vLLM/Ollama/llama.cpp/LM Studio. Good developer experience for the size class; the main friction is choosing among three variants.
“Surprisingly capable for a 14B — the reasoning variant is noticeably more careful, the instruct variant is snappy, and EU languages feel native.”
Surprisingly capable for the size. In casual chat the reasoning variant takes longer but produces noticeably more careful answers; the instruct variant is snappy. European-language quality is strong and feels native rather than translated. Vision works on simple images and charts. Conversational warmth is below the flagships — efficient and capable rather than friendly. For an end-user-facing product this is the "free tier that still actually works," especially when it can run on-device for instant, private responses.
“The AIME 85 is real and impressive — just remember it's the reasoning variant at full effort, and the ceiling is still a 14B's ceiling.”
This is one of the more honest small-model launches — the HF card publishes real reasoning, instruct, and base suites, so there's little to debunk. The fair caveats: the headline AIME 85 is the reasoning variant at full effort (more latency and tokens), the instruct variant most apps will use is less spectacular, and a 14B's ceiling on genuinely hard, broad tasks is real — it won't replace Small 4 or Medium 3.5 where capability matters. The three-variant lineup also invites misuse if someone grabs base or instruct expecting reasoning numbers. The honest claim — "best small open model for on-device reasoning" — largely holds.
- Edge / on-prem deployment with a single-GPU constraint. - Laptop-local or branch-office agent applications needing reasoning. - Fine-tuning targets for domain-specific apps (legal, medical, code) under Apache 2.0. - Cost-floor inference for high-volume chat with occasional reasoning. - Embedded reasoning steps inside larger agent pipelines (route AIME-class math here instead of Magistral).
Yes — genuine Apache 2.0 on the HF card, no revenue carve-out. Fine-tune and self-host freely.
256K (the Mistral 3 family standard) — corrects an earlier 131K figure.
Instruct for general chat, reasoning for AIME-class math and careful multi-step answers, base for fine-tuning. Most apps want instruct.
Yes, on the reasoning variant at full effort — budget for extra latency and output tokens on hard queries.
A single 24GB GPU (one H200 at FP8, or consumer 24GB cards quantised).
Yes — a 0.4B encoder, good for screenshots/charts/light OCR, not flagship-class.
EU by default on La Plateforme, or fully on your hardware via self-host.
Does not train on API inputs by default
Last verified 2026-05-27