by Google · Gemini 3 family · best for best $/intelligence in Google's lineup
Gemini 3.1 Flash-Lite is Google DeepMind's most cost-effective frontier-adjacent model, released 2026-03-03 in preview on the Gemini API, AI Studio, and Vertex AI. It pairs a remarkable GPQA Diamond of 86.9% — beating the older Gemini 2.5 Flash on hard science — with the fastest output in any reasoning-capable Gemini (~332 tok/s) and the full 1M-token context, all at $0.25 in / $1.50 out per 1M tokens. For a buyer: this is the high-volume workhorse — classification, extraction, summarization, and bulk content at the best price-to-intelligence ratio in Google's lineup, with the caveat that it is still pre-GA. - Provider: Google (DeepMind) - Released: 2026-03-03 (preview, Pre-GA terms) - Status: preview - Context window: 1,048,576 tokens (1M) - Max output: 65,536 tokens - Modalities: text, image, audio, video in; text out - Knowledge cutoff: January 2025 - Headline price: $0.25 in / $1.50 out per 1M tokens
| Benchmark | Score | Source |
|---|---|---|
| MMMU | 76.8% | blog.google 2026-03-03T00:00:00.000Z |
| LMArena Elo | 1432 | artificialanalysis.ai 2026 |
| GPQA Diamond | 86.9% | blog.google 2026-03-03T00:00:00.000Z |
| Artificial Analysis Index | 34 | artificialanalysis.ai 2026-05-28T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“Best dollar-per-intelligence on Google's stack — the only reservation is that it's still pre-GA.”
For workloads that don't need frontier reasoning, 3.1 Flash-Lite delivers the best $/intelligence Google offers, with GPQA 86.9% proving it's not a toy. Vertex governance, VPC-SC, and Workspace integration are identical to the Pro tier, so there's no security trade-off. The strategic question is preview status: for non-critical, high-volume paths the savings justify it; for mission-critical paths, weigh GA-stable 2.5 Flash-Lite. Lock-in is the usual Google Cloud consideration — real but acceptable if you're already there. The 1M context lets it skip RAG in places Flash-Lite-class models historically couldn't.
“It quietly redraws the cheap-model frontier: reasoning-grade quality at extraction-grade price.”
In market terms, 3.1 Flash-Lite competes on the value frontier — the upper-left of the price/intelligence scatter — where GPQA 86.9% at $0.25 undercuts most rivals' quality-per-dollar. Its differentiation is offering reasoning capability and a 1M context at a price tier where competitors ship non-reasoning models with smaller windows. The moat, again, is Google distribution and Search grounding. Market timing is good as teams hunt for cost relief amid rising Flash prices. The weakness is positioning clarity: it overlaps GA-stable 2.5 Flash-Lite on price and the preview label muddies the buy decision.
“The cost winner in the lineup — $0.25/$1.50 with cache reads at 2.5 cents makes high-volume math trivial.”
This is the clear cost leader among reasoning-capable Gemini models — $0.25 input, $1.50 output, cache reads at $0.025. For high-volume pipelines it's roughly 6x cheaper than 3.5 Flash and ~12x cheaper than 3.1 Pro per token, and batch shaves another ~50%. The AI Studio free tier is generous for prototyping. Pricing is flat with no over-200K cliff, so it's predictable. The one finance caveat is preview status: budget a re-evaluation when GA pricing lands, which may shift slightly. Note audio input bills at 2x text ($0.50), which matters for transcription-heavy pipelines.
“Same SDK as the rest of Gemini, plus 332 tok/s — swapping it into a streaming pipeline is a one-liner.”
For builders, the appeal is zero migration cost (identical Vertex/AI Studio surface) plus genuinely fast output that changes streaming UX. Function calling and structured output via response schemas work reliably; the 1M context is preserved, which is unusual at this tier. The limits show up in longer agent loops, which feel less stable than on 3.5 Flash — keep tool chains short. Preview Tier 1 RPD caps will bite prototyping at scale until the spend gate clears. For classification, extraction, and bulk transformation, ergonomics and speed are excellent.
“Fast and good enough for most things — I just don't reach for it on the hardest reasoning.”
End users rarely meet Flash-Lite directly; it lives inside apps and embedded assistants rather than as the Gemini app default. Where it surfaces, the perceptible quality is "good enough for most tasks" — fast, helpful, occasionally less thorough than the Pro tier on hard problems. The user-visible benefit is speed: responses stream noticeably faster than 3.5 Flash. Refusals follow the same Google policy stack as 3.1 Pro. For casual factual questions, Search grounding keeps it current. It's infrastructure more than a headline consumer model.
“An 86.9% GPQA at $0.25 is impressive — but it's a preview with unpublished long-context recall and no SWE-bench on the card.”
The value story is real and the GPQA number is the genuine surprise. But the disclosure is thin: Google's launch blog gives GPQA and MMMU-Pro and little else — no SWE-bench, no AIME, no MRCR, no HLE — so coding and long-context claims rest on the family reputation, not this model's published evals. It's a preview, so pricing and behavior can change. And the 1M context is advertised without any recall figure, which for a Flash-Lite model likely degrades earlier than the Pro tier. The honest read: an excellent cheap reasoning model whose data sheet is half-blank.
- High-volume classification, extraction, and summarization where cost-per-call dominates. - Programmatic content generation (product descriptions, SEO pages, ad variations). - Cost-sensitive in-app chat assistants needing reasoning at low unit cost. - Multimodal triage: screenshot analysis, image moderation, audio routing. - Migration target for teams downsizing off Gemini 2.5 Flash to cut cost while gaining reasoning.
No — released 2026-03-03 in preview, still under Pre-GA Offerings Terms as of 2026-05-28. Use GA-stable 2.5 Flash-Lite for mission-critical paths.
A configurable thinking budget lets it reason on hard problems; you pay for thinking tokens as output, so cap the budget where quality allows.
2.5 Flash-Lite is GA and cheaper ($0.10/$0.40) but much weaker reasoning (GPQA 47.4%). 3.1 Flash-Lite is preview but far stronger.
Google hasn't published MRCR recall for this model. Treat the reliable working range conservatively and chunk for precise retrieval.
Yes — audio input is $0.50/1M (2x the $0.25 text rate). Factor this into transcription pipelines.
No. Gemini is closed-weights, API/Vertex only.
~332 tok/s with a 5.6s TTFT — the fastest reasoning-capable Gemini, ideal for streaming and bulk throughput.
Does not train on API inputs by default
Last verified 2026-05-27