Mistral Small 4

GALatest Small

by Mistral AI · Mistral Small family · best for best price-to-capability open multimodal model

Cost-OptimizedOpen-WeightsReasoningMultimodalLong-Context
8.5
AI Panel Score
Value 9.5/10

Mistral Small 4 (release 26.03, shipped 16 March 2026) is the cost-disruption release of the 2026 lineup: a 119B-parameter Mixture-of-Experts with only 6.5B active per token (128 experts, 4 active), priced at $0.15/$0.60 per 1M tokens, under genuine Apache 2.0. Despite the "Small" label it scores 78.0% on MMLU-Pro and 71.2% on GPQA Diamond — within reach of Medium 3.1 and Large 3 — while running on a single GPU. It unifies chat, reasoning, and coding in one efficient model. The buyer's sentence: the best price-to-capability open multimodal model in 2026, with Apache 2.0 as a structural moat. - Provider: Mistral AI (Paris, France) - Release: 2026-03-16, status GA - Context: 256,000 tokens; max output 16,384 - Modalities: text + image in, text out (native multimodal) - Knowledge cutoff: ~January 2026 - Headline price: $0.15 input / $0.60 output per 1M tokens - Architecture: MoE, 119B total / 6.5B active, 128 experts (4 active), MLA attention

What's new

  • First Mistral Small to use Mixture-of-Experts: 119B total but only 6.5B active per token, versus the dense 24B of the Small 3.x line. Despite 5x the total parameters, only 6.5B are active (vs 24B for Small 3), so a workflow handling 100 req/s on Small 3 handles ~300 on Small 4 on the same hardware.
  • Unified hybrid: chat, reasoning, and coding in one model with configurable reasoning depth.
  • Context expanded from 128K (Small 3.2) to 256K.
  • Output throughput ~180 tps — well above the price-tier median.
  • Native multimodal vision input via the MoE's image path.

Benchmarks

BenchmarkScoreSource
MMLU-Pro78%venturebeat.com 2026-03-16T00:00:00.000Z
GPQA Diamond71.2%huggingface.co 2026-03-16T00:00:00.000Z
Artificial Analysis Index28artificialanalysis.ai 2026-05-28T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker8.5/10
It re-anchors the 'cheap and good' tier: single-GPU production, Apache 2.0, and one endpoint for chat and reasoning. My new EU default unless workload justifies more.

Small 4 changes the cost curve. A 119B MoE with 6.5B active runs in production on a single A100 and absorbs meaningful traffic without flagship spend. Apache 2.0 means I fine-tune and self-host with zero license friction — a genuinely cleaner story than Medium 3.5's modified-MIT. The unified reasoning toggle collapses architecture: one endpoint covers chat and lightweight reasoning. For any EU SaaS shipping multilingual features, Small 4 is now my default; I only step up to Medium 3.5 or Large 3 when a workload specifically justifies it. The vision encoder is the one soft spot.

Strategic Fit 9Vendor Risk 9Roadmap Confidence 8
Pros
  • single-GPU, clean Apache 2.0, reasoning toggle, low price
Cons
  • modest vision
  • young tooling
Right for: EU multilingual SaaS defaults
Avoid if: you need top vision or peak reasoning
Domain Strategist8.5/10
Small 4 is Mistral's most disruptive product on a per-dollar basis — it makes 'good enough' frontier capability nearly free to self-host.

Strategically this is the model that pressures the entire cost-optimised tier. By delivering MMLU-Pro 78 and GPQA 71 at $0.15/$0.60 and single-GPU self-host under Apache 2.0, Mistral undercuts both closed small models (Haiku, GPT-5 mini) and forces open competitors (Qwen, Llama) to answer on price-per-intelligence. The AA Index 28 exceeding the flagship's 23 is a clean talking point. The wedge is the combination — cheap + open + reasoning + vision + EU residency — which no single competitor matches on all axes. Market timing aligns with the 2026 push toward agentic, cost-controlled deployments.

Competitive Positioning 9Differentiation 9Market Timing 8
Pros
  • unmatched price-per-intelligence combo
  • clean license
Cons
  • moat is on cost, not peak quality
Right for: cost-led, open-weight strategies
Avoid if: your buyers pay for frontier ceiling
Finance Lead9.5/10
At $0.15/$0.60 with reasoning, vision, and 256K context, the value-per-dollar crushes anything closed at this capability — and self-host makes it nearly free at scale.

This is the cost story of the lineup. $0.15/$0.60 with a ~50% batch discount delivers reasoning-capable multimodal at a price closed competitors can't match at the same capability. The deeper lever is Apache 2.0 self-host on a single GPU, which converts opex to capex for steady-state workloads with no license fee — unlike Medium 3.5. Annual forecasts become trivially predictable. For any high-volume chat or agent product chasing a unit-economics target, Small 4 is the obvious starting point and frequently the finish line.

Cost Efficiency 10Pricing Transparency 9Value per Dollar 10
Pros
  • lowest cost for the capability
  • free-and-clear self-host
Cons
  • none material on cost
Right for: unit-economics-driven products
Avoid if: you need ceiling capability regardless of price
Domain Practitioner9/10
Same API as the rest of the family, a genuinely useful reasoning dial, real 256K context, and Apache 2.0 so I ship private fine-tunes without a second thought.

Ergonomically Small 4 is identical to the rest of Mistral, so swapping with Medium 3.5 is a model-name change. The reasoning-effort dial is the standout: lean fast for simple tasks, crank it up for hard ones, no routing to a different model. 256K context is real. Apache 2.0 lets me ship private fine-tunes with confidence and the MLA attention keeps long-context serving affordable. Negatives: vision is "fine" not "great," and structured-output stability under heavy reasoning is still being shaken out. Best price-to-capability open-weight multimodal small model in the market.

API Ergonomics 9Tool/Agent Support 8Reliability 8
Pros
  • reasoning dial, real long context, clean fine-tune license
Cons
  • soft vision
  • reasoning-mode output stability
Right for: cost-sensitive builders
Avoid if: vision is central to your product
Power User7.5/10
180 tps makes it feel snappier than Medium, and for everyday tasks I rarely notice it isn't a flagship.

In Le Chat the small tier feels fast — ~180 tps is noticeably snappier than Medium. For summarising, drafting, simple coding, and multilingual help the quality is good enough that the gap to a flagship rarely shows. Vision works on screenshots and receipts well enough. With reasoning toggled on, responses take longer but become markedly more careful. Conversational warmth is below Claude or GPT-5 — efficient rather than friendly. Strong everyday utility at a price low enough that product teams can give it away in a free tier.

Output Quality 7.5Speed 8.5Everyday Usefulness 8
Pros
  • fast, capable enough, strong EU languages
Cons
  • not the warmest
  • soft vision
Right for: free-tier and everyday product features
Avoid if: you want flagship polish
Skeptic7.5/10
The price and the Apache license are real wins — but 'matches GPT-OSS 120B on AIME' is a curated comparison, and the vision encoder is genuinely weak.

Small 4 is the rare Mistral launch where the value claim mostly survives scrutiny: the price is real, the license is genuinely Apache 2.0, and AA Index 28 is independently measured. Where I push back is the framing. Mistral benchmarks against GPT-OSS 120B (a deliberately chosen open peer) rather than the strongest closed small models, and headlines AIME at high reasoning effort, which masks the latency and token-count cost of that mode. The vision encoder is small and the vision quality shows it. The honest claim is "best open price-to-capability multimodal," not "matches the frontier" — but for once the gap between marketing and reality is narrow.

Claim Accuracy 8Weakness Severity 7Hype vs Reality 8
Pros
  • claims largely hold
  • clean license
Cons
  • curated peer comparison
  • weak vision
Right for: buyers who value honest price-per-intelligence
Avoid if: you need strong vision or take "matches frontier" literally

Strengths

  • 5-7x cheaper than peer multimodal reasoning models, with frontier-adjacent MMLU-Pro/GPQA.
  • Genuine Apache 2.0 open weights — self-hostable on a single GPU, no revenue carve-out.
  • 256K context at the small tier is rare.
  • Configurable reasoning effort — one SKU covers chat through hard math.
  • Native vision at this price is unusual.
  • ~180 tps throughput; AA Index 28 beats the flagship Large 3's 23.

Limitations

  • Mistral published an unconventional benchmark mix (MMLU-Pro, GPQA, AIME) and skipped plain MMLU and a standardized LiveCodeBench number, complicating head-to-head.
  • High-reasoning mode dramatically increases latency and output token count.
  • Small vision encoder — vision is acceptable, not Pixtral-class.
  • English conversational polish trails Claude Haiku and GPT-5 mini.
  • New enough that community fine-tunes and production tooling are still catching up.

Best use cases

- High-volume agent and tool-use loops where per-call cost dominates. - Edge / self-hosted deployment on single-GPU servers or strong workstations. - Multilingual chat in European languages at consumer-grade pricing. - Long-context document Q&A on a budget. - Workloads needing both chat and lightweight reasoning without paying for two SKUs.

Buyer questions

Is the license really clean?

Yes — genuine Apache 2.0 on the Hugging Face card, no revenue threshold (unlike Medium 3.5). Fine-tune, self-host, and redistribute freely.

Can it really run on one GPU?

Yes — sparse 6.5B-active means ~80GB at FP8 (one A100/H100), less at NVFP4, despite 119B total parameters.

How does it compare to Medium 3.1?

Close on MMLU-Pro and reasoning at a fraction of the cost; Medium 3.1 has a more polished overall profile but no reasoning toggle and no self-host.

Does the reasoning mode cost more?

Not per token, but high reasoning effort produces far more output tokens and higher latency — budget for that on hard queries.

Is the vision good?

Acceptable for screenshots/receipts/simple charts; not Pixtral-class for complex documents.

Where does my data live?

EU by default on La Plateforme; or fully on your hardware via self-host.

What clouds host it?

Bedrock, Azure AI Foundry, and Vertex AI, plus self-host and La Plateforme.

Comparable models

**Claude Haiku 4.5:** Closed weights, similar everyday capability, ~2-3x the price, no self-host.
**GPT-5 mini:** Closed weights, broader ecosystem, several times the price; no open-weight option.
**Qwen 3 30B-A3B:** Comparable MoE small tier and price; weaker EU-language quality.
**Llama 4 Scout:** Open weights but dense; weaker multilingual and a less clean small-MoE efficiency story.

Model specs

Input price
$0.15 / Mtok
Output price
$0.60 / Mtok
Cached input
Batch (in/out)
$0.07 / $0.30
Context window
256K tokens
Max output
16K tokens
Knowledge cutoff
2026-01
Released
2026-03-15
Modalities
text, image → text
Output speed
~180 tok/s
License
Open weights (Apache-2.0)
Clouds
Bedrock, Azure AI Foundry, Vertex AI

Does not train on API inputs by default

Last verified 2026-05-27