Gemini 3.1 Flash-Lite

previewLatest Flash-Lite

by Google · Gemini 3 family · best for best $/intelligence in Google's lineup

Cost-OptimizedLong-ContextMultimodal
7.8
AI Panel Score
Value 9.5/10

Gemini 3.1 Flash-Lite is Google DeepMind's most cost-effective frontier-adjacent model, released 2026-03-03 in preview on the Gemini API, AI Studio, and Vertex AI. It pairs a remarkable GPQA Diamond of 86.9% — beating the older Gemini 2.5 Flash on hard science — with the fastest output in any reasoning-capable Gemini (~332 tok/s) and the full 1M-token context, all at $0.25 in / $1.50 out per 1M tokens. For a buyer: this is the high-volume workhorse — classification, extraction, summarization, and bulk content at the best price-to-intelligence ratio in Google's lineup, with the caveat that it is still pre-GA. - Provider: Google (DeepMind) - Released: 2026-03-03 (preview, Pre-GA terms) - Status: preview - Context window: 1,048,576 tokens (1M) - Max output: 65,536 tokens - Modalities: text, image, audio, video in; text out - Knowledge cutoff: January 2025 - Headline price: $0.25 in / $1.50 out per 1M tokens

What's new

  • Google's best price-to-intelligence model: GPQA Diamond 86.9% (vs 68.3% on Gemini 2.5 Flash) at a fraction of the cost.
  • MMMU-Pro 76.8% and LMArena Elo 1432 — strong multimodal and human-preference scores for the price tier.
  • ~332 tokens/sec output, the fastest reasoning-capable Gemini, with a low 5.61s TTFT.
  • Full 1M-token context retained from the higher tiers — unusual at this price.
  • Reasoning-capable with a configurable thinking budget, unlike most ultra-cheap models.

Benchmarks

BenchmarkScoreSource
MMMU76.8%blog.google 2026-03-03T00:00:00.000Z
LMArena Elo1432artificialanalysis.ai 2026
GPQA Diamond86.9%blog.google 2026-03-03T00:00:00.000Z
Artificial Analysis Index34artificialanalysis.ai 2026-05-28T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker8/10
Best dollar-per-intelligence on Google's stack — the only reservation is that it's still pre-GA.

For workloads that don't need frontier reasoning, 3.1 Flash-Lite delivers the best $/intelligence Google offers, with GPQA 86.9% proving it's not a toy. Vertex governance, VPC-SC, and Workspace integration are identical to the Pro tier, so there's no security trade-off. The strategic question is preview status: for non-critical, high-volume paths the savings justify it; for mission-critical paths, weigh GA-stable 2.5 Flash-Lite. Lock-in is the usual Google Cloud consideration — real but acceptable if you're already there. The 1M context lets it skip RAG in places Flash-Lite-class models historically couldn't.

Strategic Fit 8Vendor Risk 7Roadmap Confidence 8
Pros
  • Best $/intelligence, real reasoning, full governance
Cons
  • Pre-GA SLAs, GA pricing uncertainty
Right for: High-volume non-critical workloads on Google
Avoid if: You need GA stability for mission-critical paths
Domain Strategist7.5/10
It quietly redraws the cheap-model frontier: reasoning-grade quality at extraction-grade price.

In market terms, 3.1 Flash-Lite competes on the value frontier — the upper-left of the price/intelligence scatter — where GPQA 86.9% at $0.25 undercuts most rivals' quality-per-dollar. Its differentiation is offering reasoning capability and a 1M context at a price tier where competitors ship non-reasoning models with smaller windows. The moat, again, is Google distribution and Search grounding. Market timing is good as teams hunt for cost relief amid rising Flash prices. The weakness is positioning clarity: it overlaps GA-stable 2.5 Flash-Lite on price and the preview label muddies the buy decision.

Competitive Positioning 8Differentiation 8Market Timing 7
Pros
  • Redraws value frontier, reasoning at low cost, 1M context
Cons
  • Overlaps 2.5 Flash-Lite
  • preview muddies positioning
Right for: Value-driven buyers wanting reasoning cheap
Avoid if: You need GA clarity for procurement
Finance Lead9/10
The cost winner in the lineup — $0.25/$1.50 with cache reads at 2.5 cents makes high-volume math trivial.

This is the clear cost leader among reasoning-capable Gemini models — $0.25 input, $1.50 output, cache reads at $0.025. For high-volume pipelines it's roughly 6x cheaper than 3.5 Flash and ~12x cheaper than 3.1 Pro per token, and batch shaves another ~50%. The AI Studio free tier is generous for prototyping. Pricing is flat with no over-200K cliff, so it's predictable. The one finance caveat is preview status: budget a re-evaluation when GA pricing lands, which may shift slightly. Note audio input bills at 2x text ($0.50), which matters for transcription-heavy pipelines.

Cost Efficiency 10Pricing Transparency 9Value per Dollar 10
Pros
  • Cheapest reasoning-capable Gemini, deep cache discount, flat pricing
Cons
  • Preview pricing may shift
  • audio 2x premium
Right for: High-volume cost-sensitive pipelines
Avoid if: You need locked GA pricing for long-term contracts
Domain Practitioner8/10
Same SDK as the rest of Gemini, plus 332 tok/s — swapping it into a streaming pipeline is a one-liner.

For builders, the appeal is zero migration cost (identical Vertex/AI Studio surface) plus genuinely fast output that changes streaming UX. Function calling and structured output via response schemas work reliably; the 1M context is preserved, which is unusual at this tier. The limits show up in longer agent loops, which feel less stable than on 3.5 Flash — keep tool chains short. Preview Tier 1 RPD caps will bite prototyping at scale until the spend gate clears. For classification, extraction, and bulk transformation, ergonomics and speed are excellent.

API Ergonomics 8Tool/Agent Support 7Reliability 8
Pros
  • Zero-migration SDK, fast streaming, 1M context, reliable structured output
Cons
  • Weaker on long agent loops, preview RPD caps
Right for: High-throughput extraction/summarization builders
Avoid if: You need robust multi-hop agents
Power User7/10
Fast and good enough for most things — I just don't reach for it on the hardest reasoning.

End users rarely meet Flash-Lite directly; it lives inside apps and embedded assistants rather than as the Gemini app default. Where it surfaces, the perceptible quality is "good enough for most tasks" — fast, helpful, occasionally less thorough than the Pro tier on hard problems. The user-visible benefit is speed: responses stream noticeably faster than 3.5 Flash. Refusals follow the same Google policy stack as 3.1 Pro. For casual factual questions, Search grounding keeps it current. It's infrastructure more than a headline consumer model.

Output Quality 7Speed 9Everyday Usefulness 7
Pros
  • Very fast, current via grounding, good enough for routine tasks
Cons
  • Thinner on hard reasoning, not a consumer default
Right for: Embedded assistants and fast in-app help
Avoid if: You want maximum depth per answer
Skeptic7/10
An 86.9% GPQA at $0.25 is impressive — but it's a preview with unpublished long-context recall and no SWE-bench on the card.

The value story is real and the GPQA number is the genuine surprise. But the disclosure is thin: Google's launch blog gives GPQA and MMMU-Pro and little else — no SWE-bench, no AIME, no MRCR, no HLE — so coding and long-context claims rest on the family reputation, not this model's published evals. It's a preview, so pricing and behavior can change. And the 1M context is advertised without any recall figure, which for a Flash-Lite model likely degrades earlier than the Pro tier. The honest read: an excellent cheap reasoning model whose data sheet is half-blank.

Claim Accuracy 7Weakness Severity 6Hype vs Reality 7
Pros
  • Genuine reasoning bargain, strong GPQA
Cons
  • Sparse benchmarks, preview, unpublished long-context recall
Right for: Buyers comfortable testing their own evals
Avoid if: You need a fully documented data sheet before committing

Strengths

  • Best price-to-intelligence ratio in Google's lineup ($0.25/$1.50).
  • GPQA Diamond 86.9% — beats Gemini 2.5 Flash at a fraction of the cost.
  • Fastest reasoning-capable Gemini (~332 tok/s, 5.6s TTFT).
  • Full 1M context preserved at this price.
  • Native multimodal input including video; Search grounding for live data.

Limitations

  • Preview status: weaker SLAs than GA; GA pricing may shift.
  • Trails Gemini 3.5 Flash on agentic/tool-use benchmarks (Terminal-Bench, MCP Atlas).
  • Pure reasoning trails 3.1 Pro by a wide margin (HLE, ARC-AGI-2 unpublished but clearly lower).
  • Long-context recall unpublished — reliable working range within 1M is undocumented.
  • Preview Tier 1 RPD caps (~250) until cumulative-spend gates clear.
  • January 2025 knowledge cutoff.

Best use cases

- High-volume classification, extraction, and summarization where cost-per-call dominates. - Programmatic content generation (product descriptions, SEO pages, ad variations). - Cost-sensitive in-app chat assistants needing reasoning at low unit cost. - Multimodal triage: screenshot analysis, image moderation, audio routing. - Migration target for teams downsizing off Gemini 2.5 Flash to cut cost while gaining reasoning.

Buyer questions

Is it generally available?

No — released 2026-03-03 in preview, still under Pre-GA Offerings Terms as of 2026-05-28. Use GA-stable 2.5 Flash-Lite for mission-critical paths.

How is GPQA 86.9% possible at this price?

A configurable thinking budget lets it reason on hard problems; you pay for thinking tokens as output, so cap the budget where quality allows.

What's the catch vs 2.5 Flash-Lite?

2.5 Flash-Lite is GA and cheaper ($0.10/$0.40) but much weaker reasoning (GPQA 47.4%). 3.1 Flash-Lite is preview but far stronger.

Is the 1M context reliable here?

Google hasn't published MRCR recall for this model. Treat the reliable working range conservatively and chunk for precise retrieval.

Does audio cost more?

Yes — audio input is $0.50/1M (2x the $0.25 text rate). Factor this into transcription pipelines.

Can I self-host?

No. Gemini is closed-weights, API/Vertex only.

How fast is it?

~332 tok/s with a 5.6s TTFT — the fastest reasoning-capable Gemini, ideal for streaming and bulk throughput.

Comparable models

**Gemini 2.5 Flash-Lite** — GA-stable and cheaper ($0.10/$0.40) but far weaker reasoning (GPQA 47.4% vs 86.9%). The live trade-off: GA stability and rock-bottom price vs preview status and much stronger reasoning.
**Gemini 2.5 Flash** — Older, more expensive ($0.30/$2.50), and weaker on GPQA; 3.1 Flash-Lite is the natural cost-and-quality successor for teams sizing down.
**GPT-5.4 nano / Claude Haiku 4.5** — Comparable cheap-tier price; weaker multimodal and smaller context, and no Search grounding. 3.1 Flash-Lite wins on context size and live data; the others may edge it on ecosystem fit.

Model specs

Input price
$0.25 / Mtok
Output price
$1.50 / Mtok
Cached input
$0.03 / Mtok
Batch (in/out)
$0.13 / $0.75
Context window
1.0M tokens
Max output
66K tokens
Knowledge cutoff
2025-01
Released
2026-03-02
Modalities
text, image, audio, video → text
Output speed
~332.1 tok/s
License
Proprietary
Clouds
Vertex AI, GCP

Does not train on API inputs by default

Last verified 2026-05-27