GPT-5.4 mini

GALatest mini

by OpenAI · GPT-5 family · best for best price-to-capability in the OpenAI lineup

CodingagenticCost-OptimizedMultimodal

9.0

AI Panel Score

Value 9.8/10

GPT-5.4 mini is OpenAI's high-volume workhorse, released 2026-03-17 alongside GPT-5.4 nano — the "most capable small model" OpenAI had shipped, built explicitly for the subagent era. It nearly matches full GPT-5.4 on coding (SWE-bench Pro 54.4% vs base 57.7%) at roughly one-third the cost, with a full tool surface including computer use that is rare at this price. The one-sentence buyer's take: for cost- and latency-sensitive workloads that still need strong reasoning and tool use, this is the default mini-tier choice in the OpenAI lineup as of 2026-05-28.

Compare this model All GPT-5 versions

What's new

GPT-5.4 mini is the strongest mini-tier OpenAI model to date for coding, computer use, and subagents. It narrows the SWE-bench Pro gap to GPT-5.4 base to ~3 points at roughly one-third the cost, runs more than 2x faster than GPT-5 mini, and adds full reasoning-token support with the same effort tiers as the base model. The tool surface includes apply_patch, hosted shell, computer use, and MCP via the Responses API. Cached input is $0.075/1M (90% discount). Versus its predecessor GPT-5 mini it is a clear generational jump across coding, reasoning, multimodal understanding, and tool use.

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker9/10

“The model that quietly carries production — mini-first with selective escalation is the right pattern for most architectures.”

GPT-5.4 mini is the highest-leverage SKU for most architectures: mini as the default, GPT-5.4 base or GPT-5.5 as escalation. The capability gap to base is real but narrow on most tasks; the price gap is dramatic. Computer use at this tier enables RPA replacement at a price where the ROI math works. Lock-in concerns match the rest of the family. Strategic recommendation: run mini-first, escalate selectively, instrument the routing carefully. Roadmap confidence is high, with the caveat that a future GPT-5.5 mini will reset the tier.

Strategic Fit 9Vendor Risk 7Roadmap Confidence 9

Pros

carries production
computer use at tier
cheap

Cons

lock-in
behind base on hardest tasks

Right for: mini-first routed architectures

Avoid if: every task is frontier-hard

Domain Strategist9/10

“OpenAI's answer to Haiku and DeepSeek pressure — a small model with real agentic teeth at a price that wins the volume market.”

GPT-5.4 mini was OpenAI catching up to Claude Haiku 4.5 and DeepSeek V3.2 pressure in the small-model segment, and it lands as a strong volume-market play. Differentiation is the combination of computer use, full tooling, and near-base coding at one-third the cost — few rivals offer agentic capability this complete at this price. Market timing is good: shipping mini/nano twelve days after the base captured the cost-sensitive segment before competitors could respond. The competitive moat is the tooling-plus-price combination plus promotion-path compatibility with the full-size models.

Competitive Positioning 9Differentiation 9Market Timing 9

Pros

agentic teeth at low cost
tooling parity
promotion path

Cons

400K context caps some workloads

Right for: high-volume agentic products

Avoid if: you need 1M-token windows

Finance Lead10/10

“The SKU that makes the lineup work financially — $0.075 cached input takes prefix-heavy traffic near free; track escalation rate.”

This is the SKU that makes the OpenAI lineup work financially. $0.75/$4.50 is approachable, the $0.075 cached rate (-90%) takes prefix-heavy workloads near free, and Batch halves both sides to $0.375/$2.25. For a well-architected RAG or subagent app, effective cost can be 10–20x cheaper than GPT-5.5 base on equivalent traffic. The only budget trap is escalation discipline — if too much traffic falls through to GPT-5.4 base, the savings evaporate. Track the escalation rate as a first-class metric. Value-per-dollar is the best in the family.

Cost Efficiency 10Pricing Transparency 9Value per Dollar 10

Pros

near-free cached traffic
deep Batch discount
approachable list

Cons

escalation misroutes erase savings

Right for: cost-aware RAG/subagent apps

Avoid if: you can't instrument routing

Domain Practitioner9/10

“Same Responses API, same tool semantics, same reasoning dial as base — promote prompts up and down the tier ladder freely.”

Developer happiness is high. Same Responses API surface as base, same tool semantics, same reasoning-effort dial — SDK behavior is identical, so prompts promote up and down the tier ladder with a model-name change. apply_patch and hosted shell are supported, so real coding agents run on mini for the bulk of work. The main gotcha is the 400K context — base gives 1.05M, and crossing that line in routing logic is the most common source of bugs. Latency is good at default reasoning.

API Ergonomics 9Tool/Agent Support 9Reliability 9

Pros

full tool surface
promotion-path parity
fast

Cons

400K context cliff
no fine-tuning

Right for: subagent and coding-agent builders

Avoid if: you routinely need >400K context

Power User8/10

“Most users never know they're on mini — it's high quality for everyday tasks, with the gap to 5.5 only on the hardest work.”

Most users will not know they are talking to mini — it powers many ChatGPT free-tier interactions and a large share of vertical app backends. Conversation quality is high, refusal rate is in line with the family, and latency is good. The visible gap to GPT-5.5 surfaces only on the hardest tasks: deep research, complex coding, multi-step planning. For everything else the experience is largely indistinguishable. Image discussion and tool-augmented answers work well.

Output Quality 8Speed 9Everyday Usefulness 8

Pros

high everyday quality
fast
good tool answers

Cons

gap shows on hardest tasks

Right for: everyday assistant backends

Avoid if: users hammer frontier-hard queries

Skeptic8.5/10

“'Nearly matches full GPT-5.4' is mostly true on coding — but the 400K context and thin public benchmarks hide where mini actually breaks.”

The adversarial read: the "nearly matches base" pitch holds on SWE-bench Pro (54.4% vs 57.7%) but rests on a thin public benchmark set — GPQA is reported loosely as "high 80s," and most standard evals (MMLU, AIME, LiveCodeBench, Tau-bench) have no published mini figure at all. The 400K context is half of base's window, so long-document and large-repo workloads quietly degrade or force escalation, which erodes the cost story. Computer use at this tier is real but unproven at base-model reliability. The value case is genuine; the marketing just papers over the context ceiling and the sparse eval coverage.

Claim Accuracy 8Weakness Severity 6Hype vs Reality 8

Pros

real value
honest coding parity

Cons

thin benchmarks
400K context ceiling

Right for: skeptics who validate on their own tasks

Avoid if: you need verified frontier-grade reasoning

Strengths

Best price-to-capability ratio in the OpenAI lineup at $0.75 / $4.50.
90% cached-input discount makes prefix-heavy workloads extremely cheap.
Full tool surface including computer use — rare at this tier.
Strong fit for subagent architectures where many small calls add up.
400K context is sufficient for the vast majority of production use cases.
More than 2x faster than GPT-5 mini.

Limitations

Real gap versus GPT-5.4 base on the hardest reasoning and coding tasks.
400K context vs 1.05M on base — long-document workloads may need to chunk or escalate.
No fine-tuning support.
Text-only output; image/audio/video routed to other models.
No GPT-5.5 mini exists yet, so escalation jumps to GPT-5.4 base or GPT-5.5.

Best use cases

Subagent fleets in agentic architectures — runs the leaves, escalates to GPT-5.4/GPT-5.5 at the root.
High-volume RAG pipelines where cached input dominates spend.
Bulk classification, extraction, and summarization at production scale.
Latency-sensitive chat backends where GPT-5.5 cost is unjustified.
Computer-use automation where per-task cost matters more than per-task ceiling.

Deep dive

The full research notes behind this review — verified against primary sources.

Architecture Capabilities Benchmark analysis Speed & latency Pricing analysis Deployment & access Safety & privacy Ecosystem & tooling

Architecture

OpenAI discloses no parameter counts or structure for mini — all null. It is a unified reasoning model with text + image input and text-only output, o200k_base tokenizer, and a 400K context window (smaller than the 1.05M of the full-size GPT-5.4). The positioning disclosure is the only architectural signal: mini is tuned to "nearly match full GPT-5.4" on coding at a fraction of the cost.

Capabilities

GPT-5.4 mini punches above its tier, justifying its 8.3 coding and 8.4 agentic scores — SWE-bench Pro 54.4% (vs base 57.7%) and GPQA Diamond in the high-80s, with computer use supported, which is unusual at this price. Reasoning (8.2) and math (8.1) are strong for the tier. Long-context (7.6) reflects the 400K window — ample for most repos and documents but a fraction of base's 1.05M. Function calling (8.8) and instruction-following (8.6) are reliable enough to power subagent fleets in production. Multilingual (8.2), vision (7.8), and document/OCR (7.6) handle image input but not generation. Real-time data (6.5) is web-search-dependent. The headline pitch — "nearly matches full GPT-5.4 on coding at one-third the cost" — genuinely lands.

Benchmark analysis

Benchmark	Score	vs Predecessor (GPT-5 mini)	vs Base (GPT-5.4)	Source
SWE-bench Pro	54.4%	up from 45.7%	-3.3pp vs base (57.7%)	OpenRouter
GPQA Diamond	high 80s	up from 81.6%	near base	thenewstack
Computer use	strong	up	leader at mini tier	9to5mac

Per-benchmark public coverage at mini tier is limited; SWE-bench Pro is not the same key as SWE-bench Verified, so the standard front-matter benchmark keys remain null. Aggregate positioning is "near GPT-5.4 base at one-third the cost." research_confidence is high on pricing/specs, medium on benchmark granularity.

Speed & latency

GPT-5.4 mini runs more than 2x faster than GPT-5 mini; estimated steady-state output is roughly 150–200 tokens/sec at low/default reasoning effort with sub-second time-to-first-token, making it genuinely interactive. latency_tier is fast. At higher reasoning effort the TTFT rises with the reasoning pass, but mini is designed to be used at low/default effort where it is quick. For bulk work, Batch removes latency from the equation.

Pricing analysis

Surface	Cost	Notes
API input	$0.75 / 1M tok
API output	$4.50 / 1M tok
Cached input	$0.075 / 1M tok	90% discount
Batch (in/out)	$0.375 / $2.25	50% off, 24h SLA
Flex (in/out)	$0.375 / $2.25	variable latency
Priority (in/out)	$1.50 / $9.00	2x list, low queue
Direct UI	n/a	not separately user-selectable
Free tier	none (API)
Rate limits	~30,000 RPM / ~150M TPM	tier-dependent

Reasoning-token note: hidden reasoning tokens bill as output. Keep mini at low/default effort to preserve its cost advantage.

Deployment & access

API-only via the Responses API; not open-weights, license Proprietary, not self-hostable. Cloud-managed via Azure OpenAI and Azure AI Foundry; OpenRouter proxies it. Data residency covers US and EU. The full tool surface including computer use is native — the same SDK and tool semantics as the full-size models, so prompts promote up and down the tier ladder with a model-name change.

Safety & privacy

Governed by OpenAI's Preparedness Framework. No training on API inputs by default; opt-out and zero-retention available for enterprise. Compliance covers SOC2, GDPR, CCPA, and HIPAA (BAA). Content moderation is built in. Refusal calibration is in line with the GPT-5.4 family.

Ecosystem & tooling

First-party SDKs in Python, TypeScript, Java, Go, and .NET, plus the OpenAI Agents SDK. Framework integrations span LangChain, LlamaIndex, Vercel AI SDK, and Pydantic AI. It powers a large share of ChatGPT free-tier traffic and Codex subagents. Popularity tier: mainstream.

Buyer questions

How close is mini to full GPT-5.4?

On coding, very close — SWE-bench Pro 54.4% vs 57.7% — at roughly one-third the cost. On the hardest reasoning tasks the gap is wider.

Does it support computer use?

Yes, which is unusual at this price tier and a key differentiator versus nano.

What's the context limit?

400K tokens, versus 1.05M on the full-size GPT-5.4. Plan routing around it.

How cheap can it get?

Cached input is $0.075 (-90%) and Batch is $0.375/$2.25 (-50%). Prefix-heavy traffic lands near free.

What do I escalate to?

GPT-5.4 base or GPT-5.5 — there is no GPT-5.5 mini yet.

Is my data used for training?

No, not by API default; enterprise opt-out and zero-retention exist.

Comparable models

GPT-5.4 base — OpenAI

same family, ~3x cost, ~3-point SWE-bench Pro gap, 1.05M context, the escalation target above mini.

GPT-5.4 nano — OpenAI

same family, ~4x cheaper, narrower tool surface (no computer use/tool search), lower ceiling.

Claude Haiku 4.6 — Anthropic

direct peer at this tier, often preferred for writing tone; comparable pricing.

Gemini 3 Flash Lite — Google

comparable pricing, weaker agentic coding.

Sources

Primary references used to verify this review.

Model specs

Input price: $0.75 / Mtok
Output price: $4.50 / Mtok
Cached input: $0.07 / Mtok
Batch (in/out): $0.38 / $2.25
Context window: 400K tokens
Max output: 128K tokens
Knowledge cutoff: 2025-08
Released: 2026-03-16
Modalities: text, image → text
Output speed: ~180 tok/s
License: Proprietary
Clouds: Azure OpenAI, Azure AI Foundry

Does not train on API inputs by default

Other GPT-5 versions

Last verified 2026-05-27