Is it generally available?

No — released 2026-03-03 in preview, still under Pre-GA Offerings Terms as of 2026-05-28. Use GA-stable 2.5 Flash-Lite for mission-critical paths.

How is GPQA 86.9% possible at this price?

A configurable thinking budget lets it reason on hard problems; you pay for thinking tokens as output, so cap the budget where quality allows.

What's the catch vs 2.5 Flash-Lite?

2.5 Flash-Lite is GA and cheaper ($0.10/$0.40) but much weaker reasoning (GPQA 47.4%). 3.1 Flash-Lite is preview but far stronger.

Is the 1M context reliable here?

Google hasn't published MRCR recall for this model. Treat the reliable working range conservatively and chunk for precise retrieval.

Does audio cost more?

Yes — audio input is $0.50/1M (2x the $0.25 text rate). Factor this into transcription pipelines.

No. Gemini is closed-weights, API/Vertex only.

~332 tok/s with a 5.6s TTFT — the fastest reasoning-capable Gemini, ideal for streaming and bulk throughput.

Gemini 3.1 Flash-Lite Review — Benchmarks, Pricing & AI Panel Verdict

Benchmark	Score	Source
MMMU	76.8%	blog.google 2026-03-03T00:00:00.000Z
LMArena Elo	1432	artificialanalysis.ai 2026
GPQA Diamond	86.9%	blog.google 2026-03-03T00:00:00.000Z
Artificial Analysis Index	34	artificialanalysis.ai 2026-05-28T00:00:00.000Z

Architecture

Sparse mixture-of-experts (Gemini family); parameter counts, experts, layers, and attention are undisclosed and null. Verifiable: a 1M-token context window, 65,536 max output tokens, January 2025 knowledge cutoff, and native multimodal input (text, image, audio, video). It exposes a configurable thinking budget, which is what lets a Flash-Lite-tier model post a GPQA Diamond of 86.9%. Google has not published MRCR long-context recall for this model, so the reliable working range within the 1M window is undocumented — treat it conservatively.

Capabilities

Gemini 3.1 Flash-Lite is the high-volume workhorse of the Gemini 3 generation. Reasoning is genuinely strong for the tier (cap_reasoning 7.5): GPQA Diamond 86.9% is a standout, beating Gemini 2.5 Flash on graduate-level science despite costing far less. Vision and document/OCR are solid (cap_vision 8.5, cap_document_ocr 8.0) on MMMU-Pro 76.8%. Long context (cap_long_context 8.0) preserves the full 1M window — rare at this price — though recall figures are unpublished. Coding (7.0) and math (7.0) are adequate for the tier but not frontier. Agentic/tool use (cap_agentic 6.5) is reliable for short loops but clearly trails Gemini 3.5 Flash on long-running terminal/browser agents. Function calling and structured output (cap_function_calling 8.0) work cleanly. The standout differentiators are speed (~332 tok/s) and real-time data (cap_realtime_data 9.0 via Search grounding). Creative writing (6.5) is workmanlike; multilingual (8.0) and instruction following (7.5) are solid; safety (8.0) follows Google's standard policy stack.

Benchmark analysis

Benchmark	Score	vs Predecessor	vs Top Competitor	Source
GPQA Diamond	86.9%	+18.6 pts vs 2.5 Flash (68.3%)	Class-leading at this price	blog
MMMU-Pro	76.8%	Significant gain	Strong multimodal at low cost	blog
LMArena Elo	1432	New high for Flash-Lite	Top of cheap-model tier	AA
Artificial Analysis Index	34	New	Above peers at $0.25/$1.50	AA

(MMLU-Pro varies by source and is not on Google's official card; AIME 2025, SWE-bench, LiveCodeBench, HLE, MRCR are unpublished for this model and left null.)

Speed & latency

At ~332 output tokens/sec (Artificial Analysis median) with a low ~5.6s TTFT, Gemini 3.1 Flash-Lite is the fastest reasoning-capable model in Google's lineup and among the fastest anywhere with a 1M context. For streaming UX, interactive chat, and high-throughput batch pipelines, this speed materially improves both perceived responsiveness and per-node throughput. The combination of low latency and reasoning capability is the model's signature — most models this fast are non-reasoning.

Pricing analysis

Surface	Cost	Notes
API input	$0.25 / 1M tok	Text/image/video; audio $0.50
API output	$1.50 / 1M tok	Thinking tokens billed as output
Cached input	$0.025 / 1M tok	90% discount; audio $0.05; + $1.00/1M tok/hour storage
Batch (in/out)	$0.125 / $0.75	~50% off; async; audio in $0.25
Search grounding	5,000 prompts/mo free (shared across Gemini 3), then $14 / 1,000 queries	Live Google Search
Free tier	"Free of charge" on AI Studio	Preview Tier 1 ~250 RPD until spend gate
Direct UI	$19.99/mo (AI Pro)	Not the consumer default model
Rate limits	Tiered; preview Tier 1 capped	Raises after cumulative spend gates

Deployment & access

Proprietary, closed-weights. Available via the Gemini API (Google AI Studio) and Vertex AI in preview, resold through OpenRouter. Vertex AI provides VPC-SC, CMEK, audit logging, regional pinning (US, EU, Asia), and data residency. No open weights or self-hosting. SDK surface is identical to the rest of the Gemini line, so swapping Flash-Lite in or out is a model-name change. Preview status means SLAs are weaker than GA — fine for non-critical high-volume paths, but mission-critical workloads should weigh GA-stable 2.5 Flash-Lite against this model's stronger reasoning.

Safety & privacy

Google Frontier Safety Framework with configurable filters. Paid API and Vertex inputs are not used to train models; the free AI Studio tier may be. Opt-out available. Compliance: SOC 2, HIPAA, GDPR, ISO 27001, FedRAMP, CCPA. Built-in content moderation. Refusals follow Google's standard policy stack, similar to 3.1 Pro.

Ecosystem & tooling

SDKs in Python, TypeScript, Go, Java, Dart; integrations with LangChain, LlamaIndex, Vercel AI SDK, Genkit, and Google ADK. Used in Gemini API apps, embedded assistants, and Vertex AI batch pipelines rather than as a consumer-facing default. Adoption is growing fast among cost-sensitive high-volume builders.

Gemini 3.1 Flash-Lite

What's new

Benchmarks

AI Panel Review

Strengths

Limitations

Best use cases

Deep dive

Architecture

Capabilities

Benchmark analysis

Speed & latency

Pricing analysis

Deployment & access

Safety & privacy

Ecosystem & tooling

Buyer questions

Comparable models

Sources

Model specs

Other Gemini 3 versions