Is Gemini 3.1 Pro generally available?

Not formally. It launched 2026-02-19 in preview to validate before GA, and as of 2026-05-28 the API/Vertex surface is still under Pre-GA Offerings Terms. It is, however, the production-default model in the consumer Gemini app.

What does the long-context tier cost?

Prompts over 200K tokens bill at $4.00 input / $18.00 output per 1M (vs $2/$12 under 200K). Cached reads rise to $0.40 and storage is $4.50/1M tokens/hour.

How do I get the 2M-token window?

The extended 2M context is rolling out on Vertex AI for enterprise; the standard Gemini API exposes 1M.

Does Google train on my data?

No for paid API and Vertex inputs. The free AI Studio tier may use inputs to improve products; opt-out is available.

How does it compare to 3.5 Flash for agents?

3.5 Flash beats 3.1 Pro on Terminal-Bench, MCP Atlas, and CharXiv at lower cost and 1.5x speed. Use Pro when pure reasoning, HLE-class problems, or long-context recall dominate.

What about live/current data?

Google Search grounding gives 5,000 free prompts/month (shared across Gemini 3), then $14 per 1,000 queries — the deepest real-time integration of any frontier model.

Can I self-host or get the weights?

No. Gemini is closed-weights, API/Vertex only.

Gemini 3.1 Pro Review — Benchmarks, Pricing & AI Panel Verdict

Benchmark	Score	Source
Humanity's Last Exam	44.4%	deepmind.google 2026-02-19T00:00:00.000Z
MMLU	92.6%	deepmind.google 2026-02-19T00:00:00.000Z
MMMU	80.5%	deepmind.google 2026-02-19T00:00:00.000Z
TAU-bench	90.8%	deepmind.google 2026-02-19T00:00:00.000Z
LMArena Elo	1501	facebook.com 2026
GPQA Diamond	94.3%	deepmind.google 2026-02-19T00:00:00.000Z
LiveCodeBench	2887%	deepmind.google 2026-02-19T00:00:00.000Z
Terminal-Bench	68.5%	deepmind.google 2026-02-19T00:00:00.000Z
MRCR Long Context	84.9%	deepmind.google 2026-02-19T00:00:00.000Z
SWE-bench Verified	80.6%	deepmind.google 2026-02-19T00:00:00.000Z
Artificial Analysis Index	57	artificialanalysis.ai 2026-05-28T00:00:00.000Z

Architecture

Gemini is a sparse mixture-of-experts family, but Google does not publish parameter counts, expert counts, layer counts, or attention details for 3.1 Pro — all are honestly null. What is disclosed and verifiable: a native 1M-token context window (2M on Vertex), 65,536 max output tokens, a January 2025 knowledge cutoff, and native multimodal input across text, image, audio, and video in a single context. The model uses a thinking/Deep Think mechanism whose reasoning depth scales with difficulty and can be budgeted. Long-context retrieval is strong through the documented MRCR v2 128K average of 84.9%, though recall degrades as prompts approach the 1M ceiling.

Capabilities

Gemini 3.1 Pro is Google's strongest model on reasoning (cap_reasoning 9.5) and scientific knowledge — GPQA Diamond 94.3% is the highest proprietary score on record, and HLE 44.4% leads most of the field. Coding is now frontier-class (cap_coding 9.0): SWE-bench Verified 80.6% and LiveCodeBench Pro 2887 Elo. Math is excellent (cap_math 9.0) via Deep Think. Agentic/tool use (cap_agentic 8.5) is strong on tau2-bench (Retail 90.8%, Telecom 99.3%) and BrowseComp (85.9%) but trails Gemini 3.5 Flash on Terminal-Bench and MCP Atlas. Long context (cap_long_context 9.5) is a defining strength — true 1M/2M with usable retrieval. Vision and document/OCR are class-leading (cap_vision 9.5, cap_document_ocr 9.0) on MMMU-Pro 80.5% and mixed text-and-figure PDFs. Multilingual is broad (cap_multilingual 9.0, MMMLU 92.6%). The standout differentiator is real-time data (cap_realtime_data 9.5): Google Search grounding is the deepest live-retrieval integration of any frontier model. Creative writing (8.0) is good but slightly more clinical than Claude; instruction following (8.5) and function calling (9.0) are reliable; safety calibration (8.5) is strong but stricter on some prompts than OpenAI/Anthropic.

Benchmark analysis

Benchmark	Score	vs Predecessor	vs Top Competitor	Source
GPQA Diamond	94.3%	More than 2x ARC-AGI-2 reasoning vs Gemini 3 Pro	#1 proprietary, no tools	card
Humanity's Last Exam	44.4% (51.4% w/ search+code)	New high for Google	Within ~6 pts of Opus 4.7	card
ARC-AGI-2	77.1%	More than double Gemini 3 Pro	Class-leading (verified)	card
SWE-bench Verified	80.6%	Up from ~63.8% (2.5 Pro)	Roughly tied Opus 4.6, behind GPT-5.3-Codex	card
Terminal-Bench 2.0	68.5%	New	Behind Gemini 3.5 Flash (76.2% on 2.1)	card
LiveCodeBench Pro	2887 Elo	New high for Gemini	Top of public board	card
MMMU-Pro	80.5%	+pts vs 2.5 Pro	Class-leading multimodal	card
MMMLU (multilingual)	92.6%	Strong	Frontier	card
tau2-bench (Telecom / Retail)	99.3% / 90.8%	New	Class-leading tool use	card
BrowseComp	85.9%	New	Class-leading agentic search	card
MRCR v2 (128K)	84.9%	Strong	Long-context leader	card
Artificial Analysis Index	57	New high for Gemini	Top tier	AA

(MMLU-Pro, AIME 2025, MATH-500, HumanEval, IFEval, SimpleQA are not on Google's official card and are left null rather than sourced from inconsistent third parties.)

Speed & latency

At ~142.7 output tokens/sec (Artificial Analysis median) and a high ~30.8s time-to-first-token, Gemini 3.1 Pro feels like what it is: a deliberate reasoning model. TTFT is dominated by the thinking phase, especially in Deep Think mode, where end-to-end latency can run 10-30s. For interactive chat this is noticeably slower than the Flash tier; for batch reasoning, long-document analysis, and agent loops the latency is well amortized. Streaming smooths perceived latency once the first token lands. Teams needing snappy UX should route easy turns to Flash and reserve 3.1 Pro for hard ones.

Pricing analysis

Surface	Cost	Notes
API input (<=200K)	$2.00 / 1M tok	Standard tier
API output (<=200K)	$12.00 / 1M tok	Thinking tokens billed as output
API input (>200K)	$4.00 / 1M tok	Long-context tier
API output (>200K)	$18.00 / 1M tok	Long-context tier
Cached input	$0.20 (<=200K) / $0.40 (>200K)	+ $4.50 / 1M tok / hour storage
Batch (in/out, <=200K)	$1.00 / $6.00	~50% off; async
Search grounding	5,000 prompts/mo free (shared across Gemini 3), then $14 / 1,000 queries	Live Google Search
Direct UI	$19.99/mo (AI Pro); $100 & $200/mo (Ultra)	Formerly Gemini Advanced
Free tier	None	Pricing page: "Not available" for 3.1 Pro
Rate limits	Tiered; preview Tier 1 capped	Raises after cumulative spend gates

Deployment & access

Proprietary, closed-weights. Available via the Gemini API (Google AI Studio), Vertex AI, and Gemini Enterprise, plus Gemini CLI, Google Antigravity, and Android Studio. Vertex AI is the enterprise path: VPC Service Controls, CMEK, audit logging, regional pinning (US, EU, Asia), and data-residency commitments, plus the 2M-token context mode. No open weights, no self-hosting. OpenRouter resells API access. The trade-off is Google Cloud lock-in: IAM, Vertex tooling, and the two-tier pricing model are sticky once adopted, but the Workspace and BigQuery grounding is a moat no other provider matches.

Safety & privacy

Governed by Google's Frontier Safety Framework with configurable safety filters and adjustable thresholds on Vertex. Paid API and Vertex inputs are not used to train models; the AI Studio free tier may be. Opt-out is available. Compliance coverage is broad: SOC 2, HIPAA, GDPR, ISO 27001, FedRAMP, CCPA. Content moderation is built in. Refusal behavior is stricter than OpenAI and Anthropic on some safety-adjacent categories, which enterprises often see as a feature and creative users as friction.

Ecosystem & tooling

SDKs in Python, TypeScript, Go, Java, and Dart; integrations with LangChain, LlamaIndex, Vercel AI SDK, Genkit, and Google ADK. It powers the Gemini app, Google AI Pro/Ultra, NotebookLM, Gemini CLI, Google Antigravity, Android Studio, and Workspace surfaces (Docs, Sheets, Gmail). Distribution through Android, Chrome, Search, and Workspace makes Gemini's effective reach dominant regardless of standalone API share.

Gemini 3.1 Pro

What's new

Benchmarks

AI Panel Review

Strengths

Limitations

Best use cases

Deep dive

Architecture

Capabilities

Benchmark analysis

Speed & latency

Pricing analysis

Deployment & access

Safety & privacy

Ecosystem & tooling

Buyer questions

Comparable models

Sources

Model specs

Other Gemini 3 versions