Gemini 3.5 Flash

GALatest Flash

by Google · Gemini 3.5 family · best for production agent + coding backbone

CodingMultimodalLong-ContextReasoning
8.5
AI Panel Score
Value 8.0/10

Gemini 3.5 Flash is the first model in Google DeepMind's Gemini 3.5 family, launched GA at Google I/O on 2026-05-19 and positioned as the default backbone for agentic and coding workloads. Its headline trick: a Flash-tier model that beats last year's Pro tier and, on several agentic benchmarks Google publishes itself, beats the current Gemini 3.1 Pro — leading the field on MCP Atlas (83.6%) and posting Terminal-Bench 2.1 76.2% — while running roughly 4x faster than other frontier-class models at ~25% lower cost than 3.1 Pro. It keeps the full 1M context and native multimodal input. For a buyer: this is the production agent and coding engine on Google's stack, with 3.1 Pro reserved for the hardest pure-reasoning work. - Provider: Google (DeepMind) - Released: 2026-05-19 (GA, Google I/O 2026) - Status: GA - Context window: 1,048,576 tokens (1M) - Max output: 65,536 tokens - Modalities: text, image, audio, video in; text out - Knowledge cutoff: January 2025 - Headline price: $1.50 in / $9.00 out per 1M tokens (flat, no 200K tier)

What's new

  • First Gemini 3.5 model; positioned as the agentic/coding default ahead of Gemini 3.5 Pro (limited preview, GA expected June 2026).
  • Beats Gemini 3.1 Pro on agentic/coding tasks: Terminal-Bench 2.1 76.2%, MCP Atlas 83.6%, CharXiv 84.2%, MMMU-Pro 83.6%.
  • ~4x faster output than other frontier-class models; flat pricing with no over-200K cliff.
  • Priced ~3x above Gemini 3 Flash Preview but ~25% below 3.1 Pro.
  • AA Intelligence Index 55, ranking #8 of 148 models — frontier-adjacent at Flash cost.

Benchmarks

BenchmarkScoreSource
Humanity's Last Exam40.2%deepmind.google 2026-05-19T00:00:00.000Z
MMMU83.6%deepmind.google 2026-05-19T00:00:00.000Z
GPQA Diamond92.2%buildfastwithai.com 2026-05-20T00:00:00.000Z
Terminal-Bench76.2%deepmind.google 2026-05-19T00:00:00.000Z
MRCR Long Context77.3%deepmind.google 2026-05-19T00:00:00.000Z
SWE-bench Verified80.8%benchlm.ai 2026-05-24T00:00:00.000Z
Artificial Analysis Index55artificialanalysis.ai 2026-05-28T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker9/10
The model Google wants us to standardize on — fastest path to frontier-adjacent quality with full Vertex governance.

For a CTO running an agent fleet, 3.5 Flash hits the sweet spot: 4x speed, ~25% under 3.1 Pro, best-in-class agentic benchmarks, and identical Vertex governance and Workspace integration to the Pro tier — no security trade-off. It's GA, so unlike 3.1 Pro there's no Pre-GA caveat. The strategic risk is the price drift from earlier Flash generations (3x Gemini 3 Flash Preview), which reshapes unit economics for teams scaling up from cheap Flash predecessors. Lock-in is limited to Google Cloud itself. Roadmap confidence is high with 3.5 Pro arriving.

Strategic Fit 9Vendor Risk 8Roadmap Confidence 9
Pros
  • GA, fast, cheap vs Pro, best agentic benchmarks, full governance
Cons
  • Price jump vs prior Flash, weak 1M recall
Right for: Agent fleets on Google Cloud
Avoid if: You need 3.1 Pro-grade reasoning or sub-2.0-Flash pricing
Domain Strategist8.5/10
Google's positioning is sharp: Flash for agents, Pro for reasoning — and 3.5 Flash genuinely owns the agent slot.

3.5 Flash is positioned as the production-agent default, and the benchmarks back the claim — leading MCP Atlas field-wide and beating its own Pro tier on agentic and coding tasks. Its competitive moat is the combination of speed, Search grounding, and Vertex distribution rather than any single eval lead. Against GPT-5 mini and Claude Sonnet it differentiates on agentic tooling and multimodal/chart reasoning. Market timing at I/O 2026 was strong, capturing the agent-stack conversation. The main strategic muddle is internal: 3.5 Flash overlapping 3.1 Pro on coding can confuse buyers on which to pick.

Competitive Positioning 9Differentiation 8Market Timing 9
Pros
  • Clear agent positioning, field-leading tool use, strong distribution
Cons
  • Overlaps 3.1 Pro
  • differentiation is ecosystem-driven
Right for: Agent-first buyers on Google
Avoid if: You want a standalone benchmark king independent of ecosystem
Finance Lead8/10
Flat $1.50/$9 with a 90% cache discount beats Pro's tiered model — just don't pretend it's still cheap Flash.

At $1.50/$9 with $0.15 cached input and batch at $0.75/$4.50, 3.5 Flash is a clear value step down from 3.1 Pro ($2/$12) — and crucially has no over-200K cliff, so cost is predictable on long prompts. For agent fleets, TCO genuinely beats running them on Pro. The honest caveat: it's ~3x the price of Gemini 3 Flash Preview and 6x Gemini 3.1 Flash-Lite, so teams migrating up from cheap Flash predecessors must re-model unit economics. Thinking tokens bill as output and can inflate cost on hard turns; cap the thinking budget where quality allows.

Cost Efficiency 8Pricing Transparency 9Value per Dollar 8
Pros
  • Flat pricing, deep cache discount, cheaper than Pro
Cons
  • 3x prior Flash, thinking-token output costs
Right for: Agent fleets sizing down from Pro
Avoid if: Budgets were built on sub-$0.50 Flash pricing
Domain Practitioner9/10
Tool calls stay coherent across long loops and MCP just works — this is the most fun Gemini to build agents on.

For a builder, 3.5 Flash is the best agentic experience in the family: stable tool state over long loops, real Model Context Protocol support, reliable terminal and browser automation, clean structured output via response schemas, and built-in code execution. Function-calling latency is noticeably faster than Pro, which materially improves agent UX. The 1M context skips a lot of RAG — but weak 1M recall means you still chunk for precision retrieval. SDK surface is identical to the rest of Gemini, so swapping it in is a one-line change. January 2025 cutoff forces fresh-data plumbing for current events.

API Ergonomics 9Tool/Agent Support 10Reliability 8
Pros
  • Best-in-family tool use, MCP support, fast function calls, code execution
Cons
  • Weak 1M recall, dated cutoff
Right for: Agent and tool-heavy builders
Avoid if: You need reliable retrieval across the full 1M window
Power User8.5/10
Snappy and capable for everyday work; it occasionally feels less thorough than 3.1 Pro on the hardest problems.

In the Gemini app's default mode and on AI Pro, 3.5 Flash is the model most users actually touch. With thinking dialed down it's noticeably faster than 3.1 Pro Deep Think, and conversation quality is strong for everyday research, drafting, and analysis. On the hardest reasoning it can feel thinner than Pro. Multimodal input (charts, screenshots, video) is excellent. Refusal rate is similar to 3.1 Pro — stricter than OpenAI/Anthropic on some prompts. The 2026 UX overhaul (native macOS app, cleaner mobile) helped; Trustpilot remains mixed, mostly about caps and policy rather than this model.

Output Quality 8.5Speed 9Everyday Usefulness 8.5
Pros
  • Fast, strong multimodal, good everyday quality
Cons
  • Thinner than Pro on hard reasoning, stricter refusals
Right for: Daily drivers wanting speed + capability
Avoid if: You always need maximum reasoning depth
Skeptic7.5/10
A 1M-token model that recalls 26.6% at 1M, with a ~61% hallucination rate — the context number is marketing, not memory.

The agentic wins over 3.1 Pro are real and impressive, but the long-context story is oversold: MRCR collapses from 77.3% at 128K to 26.6% at 1M, so the headline window vastly exceeds reliable working memory. AA's ~61% Omniscience hallucination rate means Search grounding is doing heavy lifting on factual tasks. "Beats last year's Pro" is true but cherry-picked to agentic benchmarks; on pure reasoning (HLE, ARC-AGI-2) it clearly loses to 3.1 Pro. And the SWE-bench Verified ~80.8% figure is from aggregators, not Google's own card, which lists the harder SWE-Bench Pro at 55.1%. Good agent model — read the asterisks on context and factuality.

Claim Accuracy 7Weakness Severity 6Hype vs Reality 7
Pros
  • Genuine agentic leadership, fast, GA
Cons
  • Weak 1M recall, high hallucination, cherry-picked comparisons
Right for: Buyers who ground factual tasks and chunk long context
Avoid if: You trust the 1M window as reliable memory

Strengths

  • Best agentic/tool-use benchmarks in the Flash tier; beats 3.1 Pro on MCP Atlas, Terminal-Bench, CharXiv, MMMU-Pro.
  • ~4x faster than frontier-class peers at ~25% lower cost than 3.1 Pro.
  • Flat pricing with no over-200K cliff; 90% cache discount.
  • Full 1M context and native multimodal input including video.
  • Class-leading multimodal/chart understanding (CharXiv 84.2%, MMMU-Pro 83.6%).

Limitations

  • 1M context but weak 1M recall (MRCR 26.6%) — the window outruns reliable working memory.
  • Loses to 3.1 Pro on pure reasoning (HLE, ARC-AGI-2) and long-context recall.
  • ~3x more expensive than Gemini 3 Flash Preview — a real jump for high-volume pipelines migrating up.
  • ~61% AA hallucination rate; needs grounding for factual reliability.
  • January 2025 knowledge cutoff despite the later release date.
  • Output is text-only; image/video generation needs Imagen / Veo handoff.

Best use cases

- Production agent backbones: customer support, browser automation, MCP tool routing. - High-volume coding agents where 3.1 Pro pricing is unjustified. - Real-time multimodal apps needing low latency and video/chart input. - Large-context summarization and transformation at scale. - Migration target for teams leaving Gemini 2.0 Flash before the 2026-06-01 shutdown.

Buyer questions

Is 3.5 Flash actually better than 3.1 Pro?

On agentic and coding/multimodal benchmarks Google publishes (MCP Atlas, Terminal-Bench, CharXiv, MMMU-Pro), yes. On pure reasoning (HLE, ARC-AGI-2) and long-context recall, no — 3.1 Pro wins. Pick by workload.

What does it cost vs prior Flash?

$1.50/$9 flat — about 3x Gemini 3 Flash Preview and 6x Gemini 3.1 Flash-Lite. The lift buys frontier-adjacent quality and 4x speed; re-model unit economics if migrating up from cheap Flash.

Is the 1M context reliable?

The window is 1M, but recall drops to 26.6% at 1M (77.3% at 128K). Treat ~128K-256K as the reliable working range and chunk beyond that.

Does it hallucinate?

AA measures ~61% on the Omniscience knowledge eval. Use Google Search grounding or retrieval for factual workloads.

How fast is it really?

~203 tok/s sustained (4x frontier peers). The high AA TTFT reflects thinking-on; lower the thinking budget for snappy first tokens.

Can I self-host?

No. Gemini is closed-weights, API/Vertex only.

Is migration from 2.x Flash hard?

No — it's a model-name swap; the SDK surface is identical across the Gemini line.

Comparable models

**Gemini 3.1 Pro** — Wins on pure reasoning (HLE 44.4% vs 40.2%, ARC-AGI-2 77.1% vs 72.1%) and long-context recall; loses on speed, agentic tasks, and price. The intended internal split: Flash for agents, Pro for reasoning.
**Claude Sonnet 4.6** — Stronger creative tone and edges SWE-Bench Pro; weaker on agentic tool benchmarks and lacks Search grounding. Sonnet for writing, 3.5 Flash for agents.
**GPT-5.4 mini** — Comparable price and speed band; weaker native video and chart reasoning, no Workspace/Search-grounding tie-in. GPT-5.4 mini for OpenAI-stack agents; 3.5 Flash for Google-stack multimodal agents.

Model specs

Input price
$1.50 / Mtok
Output price
$9 / Mtok
Cached input
$0.15 / Mtok
Batch (in/out)
$0.75 / $4.50
Context window
1.0M tokens
Max output
66K tokens
Knowledge cutoff
2025-01
Released
2026-05-18
Modalities
text, image, audio, video → text
Output speed
~203.5 tok/s
License
Proprietary
Clouds
Vertex AI, GCP

Does not train on API inputs by default

Last verified 2026-05-27