by OpenAI · GPT-5 family · best for default production workhorse
GPT-5.4 is OpenAI's mainstream workhorse, released 2026-03-05 — the cost-effective default that sits between the cheaper GPT-5.4 mini and the costlier GPT-5.5 flagship. It was the first OpenAI GA model with production-grade computer use (clearing human-expert level on OSWorld) and ships the full Responses API tool surface (apply_patch, hosted shell, tool search, skills). The one-sentence buyer's take: for most production builds this is the right default at half the price of GPT-5.5, with GPT-5.5 reserved as the targeted upgrade for the hardest agentic work. - Provider: OpenAI - Release: 2026-03-05 - Status: GA - Context: 1,050,000 tokens (input pricing rises past 272K) - Max output: 128,000 tokens - Modalities: text + image in, text out - Knowledge cutoff: 2025-08 - Headline price: $2.50 in / $15.00 out per 1M tokens
| Benchmark | Score | Source |
|---|---|---|
| Humanity's Last Exam | 39.8% | llm-stats.com 2026-03-05T00:00:00.000Z |
| MMMU | 81.2% | llm-stats.com 2026-03-05T00:00:00.000Z |
| LMArena Elo | 1484 | presenc.ai 2026-05-01T00:00:00.000Z |
| GPQA Diamond | 92.8% | llm-stats.com 2026-03-05T00:00:00.000Z |
| Terminal-Bench | 75.1% | llm-stats.com 2026-03-05T00:00:00.000Z |
| MRCR Long Context | 36.6% | nipralo.com 2026-03-05T00:00:00.000Z |
| Artificial Analysis Index | 57 | artificialanalysis.ai 2026-03-05T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“The model most production architectures should standardize on — half the GPT-5.5 cost, inside the same envelope on non-frontier work.”
GPT-5.4 is the steady-state default. It is half the price of GPT-5.5 with capability that lands inside the same envelope on most non-frontier workloads, and production-grade computer use unlocks real RPA-replacement cases. The lock-in story is identical to the rest of OpenAI — Responses API, apply_patch, skills semantics — but at this price/quality point you accept it. The 272K pricing cliff is a real architectural constraint to design around with chunking. Treat GPT-5.4 as the default and GPT-5.5 as the targeted upgrade; roadmap confidence is high given the active cadence.
“The volume sweet spot — OpenAI's distribution plus a price that makes agentic features affordable at scale beats rivals on TCO.”
In market terms, GPT-5.4 wins the high-volume agentic and coding segment on total cost of ownership: production computer use and the full tool surface at $2.50/$15 is a strong value position against Claude Sonnet 4.6 and Gemini 3 Flash. Differentiation is the tooling-plus-price combination and ChatGPT distribution, not raw benchmark leadership (GPT-5.5 and Claude Opus lead there). Market timing is excellent — it became the default the moment GPT-5.5's price doubled, so cost-aware teams gravitate here. Its risk is internal cannibalization as a future GPT-5.5 mini arrives.
“Where the unit economics finally work — half of GPT-5.5, 90% cache discount, and Batch take effective spend an order of magnitude below list.”
$2.50 in / $15.00 out is half GPT-5.5; cached input drops to $0.25 (-90%) and Batch to $1.25/$7.50. For backend pipelines with stable prompts, prefix caching plus Batch can land effective spend an order of magnitude below list. The 272K cliff is the budget trap — segment workloads so long-context calls are deliberate. Predictability is good because tiering is transparent and well-documented. This is the model to standardize on for cost-aware production traffic, with GPT-5.5 reserved for measured escalation. Value-per-dollar is the best among OpenAI's full-size models.
“The model most developers actually ship on — mature tool calling, reliable structured output, apply_patch that makes code agents tractable.”
This is the workhorse developers ship on. Tool calling is mature, structured outputs are reliable, and the Responses API is the right primary surface. apply_patch makes code-edit agents tractable; computer use plus hosted shell means serious automation in a single SDK call. Versus GPT-5.5 the gap on hard coding tasks is real but most day-to-day tasks land identically. The reasoning-effort dial is the key DX upgrade — tune compute per request, not per model. Minor friction: no fine-tuning, the 272K context cliff to watch. SDK coverage across Python/TS/Java/Go/.NET is excellent.
“On ChatGPT Plus this handles most queries fast and well — the gap to 5.5 only shows on the hardest agent and code tasks.”
For ChatGPT Plus users, GPT-5.4 is the workhorse that handles most queries fast. Latency at default reasoning is good, refusals are reasonable, and conversation quality is high. The gap versus GPT-5.5 shows up only on the hardest agentic and code tasks; for everyday drafting, research, image discussion, and light coding it is essentially indistinguishable from the flagship. The knowledge cutoff (2025-08) occasionally surfaces in time-sensitive questions, mitigated by web search.
“A great value model, but 'first to beat human experts on OSWorld' is a narrow benchmark — and 1M context is mostly a spec, not a capability.”
The adversarial read: GPT-5.4 is genuinely the value pick, but two marketing claims deserve scrutiny. The "beats human experts on OSWorld" headline is one benchmark on a constrained task set — real-world computer use is far messier. And the 1.05M context is largely a spec: at 1M tokens retention is only 36.6%, so anything beyond the 272K standard tier is both expensive and degraded. The public benchmark trail is also thinner than GPT-5.5's (no separate MMLU-Pro, AIME, SWE-bench Verified figure). None of this undermines the value case — it just means the headline numbers oversell the long-context and computer-use stories.
- Default production SKU for agents, coding, and analytical workloads where GPT-5.5 cost is unjustified. - Computer-use automation (RPA, browser agents) where OSWorld-class reliability matters. - Long-context document analysis under 272K tokens. - High-volume backend pipelines paired with Batch + cached input for unit economics. - Tiered routing: GPT-5.4 mini for cheap traffic, GPT-5.4 for default, GPT-5.5 for escalation.
Default to GPT-5.4 — it is half the price and comparable on most non-frontier work. Escalate to GPT-5.5 only for the hardest agentic/coding tasks.
The standard tier to 272K is genuinely usable; beyond that, costs rise 2x/1.5x and retention drops to ~36.6%. Treat 1M as overflow, not a workhorse window.
No — image input only; generation routes to gpt-image-2.
Cached input is $0.25 (-90%) and Batch is $1.25/$7.50 (-50%). A well-cached pipeline lands far below list.
Not as of 2026-05-28.
No, not by API default; enterprise opt-out and zero-retention exist.
Does not train on API inputs by default
Last verified 2026-05-27