by Google · Gemini 3.5 family · best for production agent + coding backbone
Gemini 3.5 Flash is the first model in Google DeepMind's Gemini 3.5 family, launched GA at Google I/O on 2026-05-19 and positioned as the default backbone for agentic and coding workloads. Its headline trick: a Flash-tier model that beats last year's Pro tier and, on several agentic benchmarks Google publishes itself, beats the current Gemini 3.1 Pro — leading the field on MCP Atlas (83.6%) and posting Terminal-Bench 2.1 76.2% — while running roughly 4x faster than other frontier-class models at ~25% lower cost than 3.1 Pro. It keeps the full 1M context and native multimodal input. For a buyer: this is the production agent and coding engine on Google's stack, with 3.1 Pro reserved for the hardest pure-reasoning work. - Provider: Google (DeepMind) - Released: 2026-05-19 (GA, Google I/O 2026) - Status: GA - Context window: 1,048,576 tokens (1M) - Max output: 65,536 tokens - Modalities: text, image, audio, video in; text out - Knowledge cutoff: January 2025 - Headline price: $1.50 in / $9.00 out per 1M tokens (flat, no 200K tier)
| Benchmark | Score | Source |
|---|---|---|
| Humanity's Last Exam | 40.2% | deepmind.google 2026-05-19T00:00:00.000Z |
| MMMU | 83.6% | deepmind.google 2026-05-19T00:00:00.000Z |
| GPQA Diamond | 92.2% | buildfastwithai.com 2026-05-20T00:00:00.000Z |
| Terminal-Bench | 76.2% | deepmind.google 2026-05-19T00:00:00.000Z |
| MRCR Long Context | 77.3% | deepmind.google 2026-05-19T00:00:00.000Z |
| SWE-bench Verified | 80.8% | benchlm.ai 2026-05-24T00:00:00.000Z |
| Artificial Analysis Index | 55 | artificialanalysis.ai 2026-05-28T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“The model Google wants us to standardize on — fastest path to frontier-adjacent quality with full Vertex governance.”
For a CTO running an agent fleet, 3.5 Flash hits the sweet spot: 4x speed, ~25% under 3.1 Pro, best-in-class agentic benchmarks, and identical Vertex governance and Workspace integration to the Pro tier — no security trade-off. It's GA, so unlike 3.1 Pro there's no Pre-GA caveat. The strategic risk is the price drift from earlier Flash generations (3x Gemini 3 Flash Preview), which reshapes unit economics for teams scaling up from cheap Flash predecessors. Lock-in is limited to Google Cloud itself. Roadmap confidence is high with 3.5 Pro arriving.
“Google's positioning is sharp: Flash for agents, Pro for reasoning — and 3.5 Flash genuinely owns the agent slot.”
3.5 Flash is positioned as the production-agent default, and the benchmarks back the claim — leading MCP Atlas field-wide and beating its own Pro tier on agentic and coding tasks. Its competitive moat is the combination of speed, Search grounding, and Vertex distribution rather than any single eval lead. Against GPT-5 mini and Claude Sonnet it differentiates on agentic tooling and multimodal/chart reasoning. Market timing at I/O 2026 was strong, capturing the agent-stack conversation. The main strategic muddle is internal: 3.5 Flash overlapping 3.1 Pro on coding can confuse buyers on which to pick.
“Flat $1.50/$9 with a 90% cache discount beats Pro's tiered model — just don't pretend it's still cheap Flash.”
At $1.50/$9 with $0.15 cached input and batch at $0.75/$4.50, 3.5 Flash is a clear value step down from 3.1 Pro ($2/$12) — and crucially has no over-200K cliff, so cost is predictable on long prompts. For agent fleets, TCO genuinely beats running them on Pro. The honest caveat: it's ~3x the price of Gemini 3 Flash Preview and 6x Gemini 3.1 Flash-Lite, so teams migrating up from cheap Flash predecessors must re-model unit economics. Thinking tokens bill as output and can inflate cost on hard turns; cap the thinking budget where quality allows.
“Tool calls stay coherent across long loops and MCP just works — this is the most fun Gemini to build agents on.”
For a builder, 3.5 Flash is the best agentic experience in the family: stable tool state over long loops, real Model Context Protocol support, reliable terminal and browser automation, clean structured output via response schemas, and built-in code execution. Function-calling latency is noticeably faster than Pro, which materially improves agent UX. The 1M context skips a lot of RAG — but weak 1M recall means you still chunk for precision retrieval. SDK surface is identical to the rest of Gemini, so swapping it in is a one-line change. January 2025 cutoff forces fresh-data plumbing for current events.
“Snappy and capable for everyday work; it occasionally feels less thorough than 3.1 Pro on the hardest problems.”
In the Gemini app's default mode and on AI Pro, 3.5 Flash is the model most users actually touch. With thinking dialed down it's noticeably faster than 3.1 Pro Deep Think, and conversation quality is strong for everyday research, drafting, and analysis. On the hardest reasoning it can feel thinner than Pro. Multimodal input (charts, screenshots, video) is excellent. Refusal rate is similar to 3.1 Pro — stricter than OpenAI/Anthropic on some prompts. The 2026 UX overhaul (native macOS app, cleaner mobile) helped; Trustpilot remains mixed, mostly about caps and policy rather than this model.
“A 1M-token model that recalls 26.6% at 1M, with a ~61% hallucination rate — the context number is marketing, not memory.”
The agentic wins over 3.1 Pro are real and impressive, but the long-context story is oversold: MRCR collapses from 77.3% at 128K to 26.6% at 1M, so the headline window vastly exceeds reliable working memory. AA's ~61% Omniscience hallucination rate means Search grounding is doing heavy lifting on factual tasks. "Beats last year's Pro" is true but cherry-picked to agentic benchmarks; on pure reasoning (HLE, ARC-AGI-2) it clearly loses to 3.1 Pro. And the SWE-bench Verified ~80.8% figure is from aggregators, not Google's own card, which lists the harder SWE-Bench Pro at 55.1%. Good agent model — read the asterisks on context and factuality.
- Production agent backbones: customer support, browser automation, MCP tool routing. - High-volume coding agents where 3.1 Pro pricing is unjustified. - Real-time multimodal apps needing low latency and video/chart input. - Large-context summarization and transformation at scale. - Migration target for teams leaving Gemini 2.0 Flash before the 2026-06-01 shutdown.
On agentic and coding/multimodal benchmarks Google publishes (MCP Atlas, Terminal-Bench, CharXiv, MMMU-Pro), yes. On pure reasoning (HLE, ARC-AGI-2) and long-context recall, no — 3.1 Pro wins. Pick by workload.
$1.50/$9 flat — about 3x Gemini 3 Flash Preview and 6x Gemini 3.1 Flash-Lite. The lift buys frontier-adjacent quality and 4x speed; re-model unit economics if migrating up from cheap Flash.
The window is 1M, but recall drops to 26.6% at 1M (77.3% at 128K). Treat ~128K-256K as the reliable working range and chunk beyond that.
AA measures ~61% on the Omniscience knowledge eval. Use Google Search grounding or retrieval for factual workloads.
~203 tok/s sustained (4x frontier peers). The high AA TTFT reflects thinking-on; lower the thinking budget for snappy first tokens.
No. Gemini is closed-weights, API/Vertex only.
No — it's a model-name swap; the SDK surface is identical across the Gemini line.
Does not train on API inputs by default
Last verified 2026-05-27