How much more does GPT-5.5 cost than GPT-5.4?

Exactly 2x on input ($5 vs $2.50) and 2x on output ($30 vs $15). Cached input and Batch each cut that substantially, so effective cost depends heavily on your caching ratio.

Does it generate images?

No. GPT-5.5 takes image input but outputs text only; image generation routes to gpt-image-2 and video to Sora-class models.

What is the reasoning-token billing gotcha?

Hidden reasoning tokens count as output tokens even though they are not returned. At high/xhigh effort a short answer can bill several times its visible length.

Is my data used for training?

No, not by API default. Opt-out and zero-retention options exist for enterprise.

Which clouds host it?

Azure OpenAI and Azure AI Foundry, plus first-party OpenAI API. OpenRouter proxies it.

What happens past 272K input tokens?

The entire session is billed at 2x input / 1.5x output. Chunk deliberately if you cross that line.

GPT-5.5 Review — Benchmarks, Pricing & AI Panel Verdict

Benchmark	Score	Source
Humanity's Last Exam	41.4%	llm-stats.com 2026-04-23T00:00:00.000Z
MMMU	81.2%	llm-stats.com 2026-04-23T00:00:00.000Z
MMLU-Pro	92.4%	tokenmix.ai 2026-04-23T00:00:00.000Z
TAU-bench	98%	openai.com 2026-04-23T00:00:00.000Z
LMArena Elo	1476	presenc.ai 2026-05-01T00:00:00.000Z
GPQA Diamond	93.6%	llm-stats.com 2026-04-23T00:00:00.000Z
Terminal-Bench	82.7%	tech-insider.org 2026-04-23T00:00:00.000Z
MRCR Long Context	74%	nipralo.com 2026-04-23T00:00:00.000Z
SWE-bench Verified	88.7%	openai.com 2026-04-23T00:00:00.000Z
Artificial Analysis Index	60	artificialanalysis.ai 2026-05-01T00:00:00.000Z

Architecture

OpenAI does not disclose parameter counts, layer counts, or whether GPT-5.5 is dense or mixture-of-experts — all are null here honestly. What is public: it is a unified reasoning model (one SKU, configurable reasoning effort) with native text and image input and text-only output. It uses the o200k_base tokenizer. The 1.05M context has a 272K break-point above which input/output rates increase. The "fully retrained base since GPT-4.5" framing and the omnimodal architecture spine are the only architectural claims OpenAI makes; everything quantitative is undisclosed.

Capabilities

GPT-5.5 is OpenAI's strongest single model for agentic work, justifying its 9.7 agentic and 9.6 coding scores — it leads SWE-bench Verified and Terminal-Bench at release and chains tool calls with markedly fewer dropped steps than GPT-5.4. Reasoning (9.6) and math (9.4) are anchored by GPQA Diamond 93.6% and MMLU-Pro 92.4%; the xhigh effort tier handles multi-minute proofs. Long-context (8.8) reflects the 74.0% 1M-token retention — strong but not perfect at the extreme. Multilingual (9.0) is competitive with the prior generation. Vision (8.5) and document/OCR (8.3) are improved image understanding, but note GPT-5.5 does not generate images — that routes to gpt-image-2. Instruction-following (9.3) and function-calling (9.5) are first-class via the Responses API. Safety calibration (8.8) reflects lower refusals on legitimate professional queries. Real-time data (7.0) depends on the native web-search tool rather than a live training feed.

Benchmark analysis

Benchmark	Score	vs Predecessor (GPT-5.4)	vs Top Competitor	Source
MMLU-Pro	92.4%	up	peer with Claude Opus 4.7	tokenmix
GPQA Diamond	93.6%	+0.8pp (92.8%)	top tier	llm-stats
SWE-bench Verified	88.7%	up from low 80s	leader at release	OpenAI
SWE-bench Pro	58.6%	+0.9pp (57.7%)	leader at release	tech-insider
Terminal-Bench 2.0	82.7%	+7.6pp (75.1%)	leader	tech-insider
ARC-AGI-2	85.0%	+11.7pp (73.3%)	leader	llm-stats
MMMU-Pro	81.2%	flat	competitive	llm-stats
HLE (no tools)	41.4%	+1.6pp (39.8%)	competitive with Gemini 3 Pro	llm-stats
Tau2-Bench Telecom	98.0%	new SOTA	leader	OpenAI
1M-token long-context	74.0%	up from 36.6%	leader	nipralo
GDPval (44 occupations)	84.9%	up	leader	tech-insider
LMArena Elo	1476 (high: 1484)	up	trails Claude Opus 4.7 (1492)	presenc
Artificial Analysis Index	60 (#1)	up from 57	#1 at release	Artificial Analysis

AIME 2025, MATH-500, HumanEval, LiveCodeBench, Aider Polyglot, IFEval, BBH, and SimpleQA have no separately published GPT-5.5 figure as of 2026-05-28; recorded null.

Speed & latency

At xhigh reasoning effort, GPT-5.5 generates ~68.7 tokens/sec on OpenAI's API with a time-to-first-token near 64 seconds — the TTFT reflects the heavy reasoning pass, not the streaming rate. At lower effort tiers it is materially faster and interactive. Crucially, OpenAI claims GPT-5.5 matches GPT-5.4 per-token latency in real-world serving while being smarter and using fewer tokens per task, because Codex rewrote the load-balancing layer for a 20%+ throughput gain. Net: latency_tier is slow for deep-reasoning runs but the model is competitive for chat at default effort.

Pricing analysis

Surface	Cost	Notes
API input (standard)	$5.00 / 1M tok	up to 272K input
API output (standard)	$30.00 / 1M tok	up to 272K input
Cached input	$0.50 / 1M tok	90% discount on prefix-cached tokens
API input (> 272K)	$10.00 / 1M tok	2x overage rate
API output (> 272K)	$45.00 / 1M tok	1.5x overage rate
Batch (in/out)	$2.50 / $15.00	50% off, 24h SLA
Flex (in/out)	$2.50 / $15.00	variable latency
Priority (in/out)	$12.50 / $75.00	2.5x list, low queue
Direct UI	$20/mo (Plus), $100–$200/mo (Pro)	default flagship in ChatGPT
Free tier	none	free ChatGPT has no GPT-5.5 access
Rate limits	15,000 RPM / 40M TPM	Tier 5

Reasoning-token note: hidden reasoning tokens are billed as output tokens but not returned in the response. A short visible answer can consume several times its length in billed reasoning tokens — budget accordingly.

Deployment & access

API-only via the Responses API (https://api.openai.com/v1/responses); not open-weights, license Proprietary, not self-hostable. Cloud-managed availability through Azure OpenAI and Azure AI Foundry. OpenRouter proxies it. Data residency options cover US and EU. The full tool surface — web search, file search, code interpreter, hosted shell, apply_patch, skills, computer use, MCP, and tool search — is native to the Responses API, which is the lock-in surface enterprise buyers should weigh.

Safety & privacy

Governed by OpenAI's Preparedness Framework. By API default OpenAI does not train on inputs (trains_on_inputs: false); data opt-out and zero-retention options exist for enterprise. Compliance covers SOC2, GDPR, CCPA, and HIPAA (BAA available); FedRAMP coverage is via Azure Government for the Azure-hosted path, not first-party. Content moderation is built in via the moderation endpoint and policy layer. Refusal calibration improved over GPT-5.4 — fewer false refusals on legitimate professional queries.

Ecosystem & tooling

First-party SDKs in Python, TypeScript, Java, Go, and .NET, plus the OpenAI Agents SDK. Framework integrations span LangChain, LlamaIndex, Vercel AI SDK, and Pydantic AI. It is the default flagship in ChatGPT (the dominant consumer AI surface) and powers GitHub Copilot and Codex. Popularity tier: dominant.

GPT-5.5

What's new

Benchmarks

AI Panel Review

Strengths

Limitations

Best use cases

Deep dive

Architecture

Capabilities

Benchmark analysis

Speed & latency

Pricing analysis

Deployment & access

Safety & privacy

Ecosystem & tooling

Buyer questions

Comparable models

Sources

Model specs

Other GPT-5 versions