by OpenAI · GPT-5 family · best for best price-to-capability in the OpenAI lineup
GPT-5.4 mini is OpenAI's high-volume workhorse, released 2026-03-17 alongside GPT-5.4 nano — the "most capable small model" OpenAI had shipped, built explicitly for the subagent era. It nearly matches full GPT-5.4 on coding (SWE-bench Pro 54.4% vs base 57.7%) at roughly one-third the cost, with a full tool surface including computer use that is rare at this price. The one-sentence buyer's take: for cost- and latency-sensitive workloads that still need strong reasoning and tool use, this is the default mini-tier choice in the OpenAI lineup as of 2026-05-28. - Provider: OpenAI - Release: 2026-03-17 - Status: GA - Context: 400,000 tokens - Max output: 128,000 tokens - Modalities: text + image in, text out - Knowledge cutoff: 2025-08 - Headline price: $0.75 in / $4.50 out per 1M tokens
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“The model that quietly carries production — mini-first with selective escalation is the right pattern for most architectures.”
GPT-5.4 mini is the highest-leverage SKU for most architectures: mini as the default, GPT-5.4 base or GPT-5.5 as escalation. The capability gap to base is real but narrow on most tasks; the price gap is dramatic. Computer use at this tier enables RPA replacement at a price where the ROI math works. Lock-in concerns match the rest of the family. Strategic recommendation: run mini-first, escalate selectively, instrument the routing carefully. Roadmap confidence is high, with the caveat that a future GPT-5.5 mini will reset the tier.
“OpenAI's answer to Haiku and DeepSeek pressure — a small model with real agentic teeth at a price that wins the volume market.”
GPT-5.4 mini was OpenAI catching up to Claude Haiku 4.5 and DeepSeek V3.2 pressure in the small-model segment, and it lands as a strong volume-market play. Differentiation is the combination of computer use, full tooling, and near-base coding at one-third the cost — few rivals offer agentic capability this complete at this price. Market timing is good: shipping mini/nano twelve days after the base captured the cost-sensitive segment before competitors could respond. The competitive moat is the tooling-plus-price combination plus promotion-path compatibility with the full-size models.
“The SKU that makes the lineup work financially — $0.075 cached input takes prefix-heavy traffic near free; track escalation rate.”
This is the SKU that makes the OpenAI lineup work financially. $0.75/$4.50 is approachable, the $0.075 cached rate (-90%) takes prefix-heavy workloads near free, and Batch halves both sides to $0.375/$2.25. For a well-architected RAG or subagent app, effective cost can be 10–20x cheaper than GPT-5.5 base on equivalent traffic. The only budget trap is escalation discipline — if too much traffic falls through to GPT-5.4 base, the savings evaporate. Track the escalation rate as a first-class metric. Value-per-dollar is the best in the family.
“Same Responses API, same tool semantics, same reasoning dial as base — promote prompts up and down the tier ladder freely.”
Developer happiness is high. Same Responses API surface as base, same tool semantics, same reasoning-effort dial — SDK behavior is identical, so prompts promote up and down the tier ladder with a model-name change. apply_patch and hosted shell are supported, so real coding agents run on mini for the bulk of work. The main gotcha is the 400K context — base gives 1.05M, and crossing that line in routing logic is the most common source of bugs. Latency is good at default reasoning.
“Most users never know they're on mini — it's high quality for everyday tasks, with the gap to 5.5 only on the hardest work.”
Most users will not know they are talking to mini — it powers many ChatGPT free-tier interactions and a large share of vertical app backends. Conversation quality is high, refusal rate is in line with the family, and latency is good. The visible gap to GPT-5.5 surfaces only on the hardest tasks: deep research, complex coding, multi-step planning. For everything else the experience is largely indistinguishable. Image discussion and tool-augmented answers work well.
“'Nearly matches full GPT-5.4' is mostly true on coding — but the 400K context and thin public benchmarks hide where mini actually breaks.”
The adversarial read: the "nearly matches base" pitch holds on SWE-bench Pro (54.4% vs 57.7%) but rests on a thin public benchmark set — GPQA is reported loosely as "high 80s," and most standard evals (MMLU, AIME, LiveCodeBench, Tau-bench) have no published mini figure at all. The 400K context is half of base's window, so long-document and large-repo workloads quietly degrade or force escalation, which erodes the cost story. Computer use at this tier is real but unproven at base-model reliability. The value case is genuine; the marketing just papers over the context ceiling and the sparse eval coverage.
- Subagent fleets in agentic architectures — runs the leaves, escalates to GPT-5.4/GPT-5.5 at the root. - High-volume RAG pipelines where cached input dominates spend. - Bulk classification, extraction, and summarization at production scale. - Latency-sensitive chat backends where GPT-5.5 cost is unjustified. - Computer-use automation where per-task cost matters more than per-task ceiling.
On coding, very close — SWE-bench Pro 54.4% vs 57.7% — at roughly one-third the cost. On the hardest reasoning tasks the gap is wider.
Yes, which is unusual at this price tier and a key differentiator versus nano.
400K tokens, versus 1.05M on the full-size GPT-5.4. Plan routing around it.
Cached input is $0.075 (-90%) and Batch is $0.375/$2.25 (-50%). Prefix-heavy traffic lands near free.
GPT-5.4 base or GPT-5.5 — there is no GPT-5.5 mini yet.
No, not by API default; enterprise opt-out and zero-retention exist.
Does not train on API inputs by default
Last verified 2026-05-27