GPT-5.4 mini

GALatest mini

by OpenAI · GPT-5 family · best for best price-to-capability in the OpenAI lineup

CodingagenticCost-OptimizedMultimodal
9.0
AI Panel Score
Value 9.8/10

GPT-5.4 mini is OpenAI's high-volume workhorse, released 2026-03-17 alongside GPT-5.4 nano — the "most capable small model" OpenAI had shipped, built explicitly for the subagent era. It nearly matches full GPT-5.4 on coding (SWE-bench Pro 54.4% vs base 57.7%) at roughly one-third the cost, with a full tool surface including computer use that is rare at this price. The one-sentence buyer's take: for cost- and latency-sensitive workloads that still need strong reasoning and tool use, this is the default mini-tier choice in the OpenAI lineup as of 2026-05-28. - Provider: OpenAI - Release: 2026-03-17 - Status: GA - Context: 400,000 tokens - Max output: 128,000 tokens - Modalities: text + image in, text out - Knowledge cutoff: 2025-08 - Headline price: $0.75 in / $4.50 out per 1M tokens

What's new

  • GPT-5.4 mini is the strongest mini-tier OpenAI model to date for coding, computer use, and subagents. It narrows the SWE-bench Pro gap to GPT-5.4 base to ~3 points at roughly one-third the cost, runs more than 2x faster than GPT-5 mini, and adds full reasoning-token support with the same effort tiers as the base model. The tool surface includes apply_patch, hosted shell, computer use, and MCP via the Responses API. Cached input is $0.075/1M (90% discount). Versus its predecessor GPT-5 mini it is a clear generational jump across coding, reasoning, multimodal understanding, and tool use.

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker9/10
The model that quietly carries production — mini-first with selective escalation is the right pattern for most architectures.

GPT-5.4 mini is the highest-leverage SKU for most architectures: mini as the default, GPT-5.4 base or GPT-5.5 as escalation. The capability gap to base is real but narrow on most tasks; the price gap is dramatic. Computer use at this tier enables RPA replacement at a price where the ROI math works. Lock-in concerns match the rest of the family. Strategic recommendation: run mini-first, escalate selectively, instrument the routing carefully. Roadmap confidence is high, with the caveat that a future GPT-5.5 mini will reset the tier.

Strategic Fit 9Vendor Risk 7Roadmap Confidence 9
Pros
  • carries production
  • computer use at tier
  • cheap
Cons
  • lock-in
  • behind base on hardest tasks
Right for: mini-first routed architectures
Avoid if: every task is frontier-hard
Domain Strategist9/10
OpenAI's answer to Haiku and DeepSeek pressure — a small model with real agentic teeth at a price that wins the volume market.

GPT-5.4 mini was OpenAI catching up to Claude Haiku 4.5 and DeepSeek V3.2 pressure in the small-model segment, and it lands as a strong volume-market play. Differentiation is the combination of computer use, full tooling, and near-base coding at one-third the cost — few rivals offer agentic capability this complete at this price. Market timing is good: shipping mini/nano twelve days after the base captured the cost-sensitive segment before competitors could respond. The competitive moat is the tooling-plus-price combination plus promotion-path compatibility with the full-size models.

Competitive Positioning 9Differentiation 9Market Timing 9
Pros
  • agentic teeth at low cost
  • tooling parity
  • promotion path
Cons
  • 400K context caps some workloads
Right for: high-volume agentic products
Avoid if: you need 1M-token windows
Finance Lead10/10
The SKU that makes the lineup work financially — $0.075 cached input takes prefix-heavy traffic near free; track escalation rate.

This is the SKU that makes the OpenAI lineup work financially. $0.75/$4.50 is approachable, the $0.075 cached rate (-90%) takes prefix-heavy workloads near free, and Batch halves both sides to $0.375/$2.25. For a well-architected RAG or subagent app, effective cost can be 10–20x cheaper than GPT-5.5 base on equivalent traffic. The only budget trap is escalation discipline — if too much traffic falls through to GPT-5.4 base, the savings evaporate. Track the escalation rate as a first-class metric. Value-per-dollar is the best in the family.

Cost Efficiency 10Pricing Transparency 9Value per Dollar 10
Pros
  • near-free cached traffic
  • deep Batch discount
  • approachable list
Cons
  • escalation misroutes erase savings
Right for: cost-aware RAG/subagent apps
Avoid if: you can't instrument routing
Domain Practitioner9/10
Same Responses API, same tool semantics, same reasoning dial as base — promote prompts up and down the tier ladder freely.

Developer happiness is high. Same Responses API surface as base, same tool semantics, same reasoning-effort dial — SDK behavior is identical, so prompts promote up and down the tier ladder with a model-name change. apply_patch and hosted shell are supported, so real coding agents run on mini for the bulk of work. The main gotcha is the 400K context — base gives 1.05M, and crossing that line in routing logic is the most common source of bugs. Latency is good at default reasoning.

API Ergonomics 9Tool/Agent Support 9Reliability 9
Pros
  • full tool surface
  • promotion-path parity
  • fast
Cons
  • 400K context cliff
  • no fine-tuning
Right for: subagent and coding-agent builders
Avoid if: you routinely need >400K context
Power User8/10
Most users never know they're on mini — it's high quality for everyday tasks, with the gap to 5.5 only on the hardest work.

Most users will not know they are talking to mini — it powers many ChatGPT free-tier interactions and a large share of vertical app backends. Conversation quality is high, refusal rate is in line with the family, and latency is good. The visible gap to GPT-5.5 surfaces only on the hardest tasks: deep research, complex coding, multi-step planning. For everything else the experience is largely indistinguishable. Image discussion and tool-augmented answers work well.

Output Quality 8Speed 9Everyday Usefulness 8
Pros
  • high everyday quality
  • fast
  • good tool answers
Cons
  • gap shows on hardest tasks
Right for: everyday assistant backends
Avoid if: users hammer frontier-hard queries
Skeptic8.5/10
'Nearly matches full GPT-5.4' is mostly true on coding — but the 400K context and thin public benchmarks hide where mini actually breaks.

The adversarial read: the "nearly matches base" pitch holds on SWE-bench Pro (54.4% vs 57.7%) but rests on a thin public benchmark set — GPQA is reported loosely as "high 80s," and most standard evals (MMLU, AIME, LiveCodeBench, Tau-bench) have no published mini figure at all. The 400K context is half of base's window, so long-document and large-repo workloads quietly degrade or force escalation, which erodes the cost story. Computer use at this tier is real but unproven at base-model reliability. The value case is genuine; the marketing just papers over the context ceiling and the sparse eval coverage.

Claim Accuracy 8Weakness Severity 6Hype vs Reality 8
Pros
  • real value
  • honest coding parity
Cons
  • thin benchmarks
  • 400K context ceiling
Right for: skeptics who validate on their own tasks
Avoid if: you need verified frontier-grade reasoning

Strengths

  • Best price-to-capability ratio in the OpenAI lineup at $0.75 / $4.50.
  • 90% cached-input discount makes prefix-heavy workloads extremely cheap.
  • Full tool surface including computer use — rare at this tier.
  • Strong fit for subagent architectures where many small calls add up.
  • 400K context is sufficient for the vast majority of production use cases.
  • More than 2x faster than GPT-5 mini.

Limitations

  • Real gap versus GPT-5.4 base on the hardest reasoning and coding tasks.
  • 400K context vs 1.05M on base — long-document workloads may need to chunk or escalate.
  • No fine-tuning support.
  • Text-only output; image/audio/video routed to other models.
  • No GPT-5.5 mini exists yet, so escalation jumps to GPT-5.4 base or GPT-5.5.

Best use cases

- Subagent fleets in agentic architectures — runs the leaves, escalates to GPT-5.4/GPT-5.5 at the root. - High-volume RAG pipelines where cached input dominates spend. - Bulk classification, extraction, and summarization at production scale. - Latency-sensitive chat backends where GPT-5.5 cost is unjustified. - Computer-use automation where per-task cost matters more than per-task ceiling.

Buyer questions

How close is mini to full GPT-5.4?

On coding, very close — SWE-bench Pro 54.4% vs 57.7% — at roughly one-third the cost. On the hardest reasoning tasks the gap is wider.

Does it support computer use?

Yes, which is unusual at this price tier and a key differentiator versus nano.

What's the context limit?

400K tokens, versus 1.05M on the full-size GPT-5.4. Plan routing around it.

How cheap can it get?

Cached input is $0.075 (-90%) and Batch is $0.375/$2.25 (-50%). Prefix-heavy traffic lands near free.

What do I escalate to?

GPT-5.4 base or GPT-5.5 — there is no GPT-5.5 mini yet.

Is my data used for training?

No, not by API default; enterprise opt-out and zero-retention exist.

Comparable models

**OpenAI GPT-5.4 base** — same family, ~3x cost, ~3-point SWE-bench Pro gap, 1.05M context, the escalation target above mini.
**OpenAI GPT-5.4 nano** — same family, ~4x cheaper, narrower tool surface (no computer use/tool search), lower ceiling.
**Anthropic Claude Haiku 4.6** — direct peer at this tier, often preferred for writing tone; comparable pricing.
**Google Gemini 3 Flash Lite** — comparable pricing, weaker agentic coding.

Model specs

Input price
$0.75 / Mtok
Output price
$4.50 / Mtok
Cached input
$0.07 / Mtok
Batch (in/out)
$0.38 / $2.25
Context window
400K tokens
Max output
128K tokens
Knowledge cutoff
2025-08
Released
2026-03-16
Modalities
text, image → text
Output speed
~180 tok/s
License
Proprietary
Clouds
Azure OpenAI, Azure AI Foundry

Does not train on API inputs by default

Last verified 2026-05-27