by OpenAI · GPT-5 family · best for cheapest viable frontier-family worker tier
GPT-5.4 nano is OpenAI's lightweight workhorse, released 2026-03-17 — the cheapest model in the GPT-5.4 frontier family, built for the highest-volume, lowest-latency workloads. Despite being four tiers below the flagship, it beats the previous-generation GPT-5 mini on SWE-bench Pro (52.4% vs 45.7%) and matches it on GPQA Diamond (82.8%). The one-sentence buyer's take: indispensable as the bottom tier of a routed architecture — classification, extraction, triage, and routing at a price that competes with embedding pipelines. - Provider: OpenAI - Release: 2026-03-17 - Status: GA - Context: 400,000 tokens - Max output: 128,000 tokens - Modalities: text + image in, text out - Knowledge cutoff: 2025-08 - Headline price: $0.20 in / $1.25 out per 1M tokens
| Benchmark | Score | Source |
|---|---|---|
| GPQA Diamond | 82.8% | aicostcheck.com 2026-03-17T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“The bottom of a clean tier ladder — it makes per-request escalation routing economically obvious at a price that rivals embeddings.”
nano belongs at the bottom of the tier ladder, and that's exactly where it adds value. At $0.20/$1.25 it changes the economics of high-volume backend pipelines — classification, routing, simple extraction — at a price competing with embedding models for many tasks. Treat it as a router and triage worker, not an answer-quality model. The lack of computer use and tool search means it cannot anchor a real agent loop, which is fine because it shouldn't try. Strategic value: it makes per-request escalation routing economically obvious.
“A volume-floor play — nano's market is the millions-of-rows operational layer where price, not capability, decides the winner.”
Strategically, nano competes in the operational-throughput floor against Gemini 3 Flash Lite and Claude Haiku-tier models. Its wedge is the combination of a frontier-family coding step (beats GPT-5 mini) at the lowest family price, plus tool-surface and SDK parity that make it a frictionless worker tier in an OpenAI stack. Differentiation here is price-per-capability, not features. Market timing was good — shipping with mini caught the cost-sensitive segment. The risk is commoditization: at this tier, every provider is racing the same price curve down.
“Where the cost story gets ridiculous in the right way — prefix-cached classification at scale rounds to single-digit dollars.”
nano is where unit economics gets surreal. $0.20/$1.25, $0.02 cached, $0.10/$0.625 on Batch. For a prefix-cached classification pipeline at scale, effective spend rounds to single-digit dollars for millions of decisions. Tier-1 rate limits cap raw throughput until you climb the tier ladder, so capacity planning matters. The discipline question is the usual one: is nano running work that needs nano, or work that should have escalated? Track misroutes and fix the router, not the model. Value-per-dollar is essentially maxed.
“The cheapest call in a tiered system — same SDK, same reasoning dial, 128K output. Just don't push it up to substitute for mini.”
nano suits a specific role: the cheapest possible call in a tiered system. Same Responses API, same SDK, same reasoning dial — promotion to mini or base is a model-name change. The 400K context is comfortable and the 128K output ceiling is unusual and useful at this tier. Tool calling is reliable for the supported surface. The limit: do not push nano to high reasoning to substitute for mini — the dollar gap closes and quality doesn't catch up. Use it as the worker tier, not a quality target.
“Users rarely see it labeled — it's plumbing. Fast for simple tasks, visibly thin on anything needing depth.”
End users essentially never see nano labeled — it powers backend functionality (chat triage, content moderation, summary generation) more than front-line conversation. Where it surfaces in product UIs, the experience is good for simple tasks and visibly weaker on anything needing depth. Refusal patterns align with the family. Latency is the user-visible win: nano feels fast. The right framing for user-facing apps: nano for the fast initial response, then a background mini call for the deeper answer if needed.
“Beating GPT-5 mini on one benchmark is a low bar — nano's real ceiling shows the moment a task needs reasoning, not routing.”
The adversarial read: "beats GPT-5 mini on SWE-bench Pro" is a low bar — GPT-5 mini is a 2025 model. nano's published evidence is two figures (SWE-bench Pro, GPQA), and everything else is `null`, so the "near-mini quality" claim is asserted more than demonstrated. Its real ceiling shows immediately on anything needing reasoning rather than routing, and the missing computer use/tool search means it cannot pretend to be an agent. The value story is genuine and the price is honest — but treat any capability claim beyond classification/extraction/triage with suspicion until you test it.
- High-volume classification, extraction, and routing pipelines. - First-pass agent triage — nano decides whether to escalate to mini or base. - Lightweight chat where latency and cost dominate. - Rewriting, query expansion, and content tagging where Batch + cached input drives cost near free. - The edge of a tiered system: nano → mini → base → flagship routing.
nano is ~4x cheaper but lacks computer use and tool search and has a lower reasoning ceiling. Use nano for triage/classification, mini for agentic work.
Yes, on every published benchmark, but it's also 4x more expensive. For the very highest volumes, the older nano may still pencil out.
Tier-1 defaults are 500 RPM / 200K TPM. Climb the usage tiers for higher ceilings.
$0.02 cached input and $0.10/$0.625 on Batch mean prefix-heavy jobs round to near-zero at scale.
Not well — no computer use or tool search. Use it as a triage/worker tier and escalate.
No, not by API default; enterprise opt-out and zero-retention exist.
Does not train on API inputs by default
Last verified 2026-05-27