by Google · Gemini 2.5 family · best for cheapest GA model with 1M context
Gemini 2.5 Flash-Lite is the lowest-cost GA model in Google's lineup — $0.10 in / $0.40 out per 1M tokens — built for ultra-high-volume, ultra-low-latency workloads while still carrying a full 1M-token context. GA on 2025-07-22, it is Google's officially designated migration target for teams leaving Gemini 2.0 Flash and 2.0 Flash-Lite before the 2026-06-01 shutdown. As of 2026-05-28 it remains the safe, cheap, GA-stable workhorse for classification, extraction, and bulk transformation. For a buyer: when cost-per-call and GA stability dominate and you don't need deep reasoning, this is the pick; if you can tolerate preview, 3.1 Flash-Lite reasons far better. - Provider: Google (DeepMind) - Released: 2025-07-22 (GA) - Status: GA - Context window: 1,048,576 tokens (1M) - Max output: 65,535 tokens - Modalities: text, image, audio in; text out (no video) - Knowledge cutoff: January 2025 - Headline price: $0.10 in / $0.40 out per 1M tokens
| Benchmark | Score | Source |
|---|---|---|
| MMLU-Pro | 72.4% | artificialanalysis.ai 2025 |
| GPQA Diamond | 47.4% | artificialanalysis.ai 2025 |
| Artificial Analysis Index | 13 | artificialanalysis.ai 2026-05-28T00:00:00.000Z |
Six personas, six verdicts — the same panel that reviews every product on TopReviewed.
“The safe, cheap, GA-stable workhorse — and the migration target Google itself names for 2.0 Flash.”
For high-volume, low-complexity workloads where GA stability and rock-bottom price are the priority, 2.5 Flash-Lite is the right Google choice today. Vertex governance is identical to higher tiers, and it's the official migration target Google recommends for the 2026-06-01 Gemini 2.0 shutdown. The strategic decision is the trade vs 3.1 Flash-Lite: if you can tolerate preview, that model reasons far better at slightly higher cost; for mission-critical paths where preview isn't acceptable, 2.5 Flash-Lite wins on stability. Lock-in is the usual Google Cloud consideration.
“It owns the floor — cheapest GA model with a 1M context — a defensible niche even as 3.1 Flash-Lite looms.”
Strategically, 2.5 Flash-Lite owns the value floor: the cheapest GA model in the lineup that still carries a 1M context. Its competitive position is stability plus price for bulk infrastructure, not capability. Differentiation is the GA label and multi-cloud reach (including OCI) at a moment when the stronger 3.1 Flash-Lite is still preview. Market timing is favorable thanks to the 2.0 deprecation wave pushing migrations its way. The risk is that once 3.1 Flash-Lite reaches GA, this model's niche narrows to pure cost-minimization.
“The cheapest GA model in the lineup — at $0.10/$0.40, high-volume unit economics basically disappear as a concern.”
This is the cost leader among GA models — $0.10 input, $0.40 output. For pipelines processing millions of items per day, the gap vs 3.1 Flash-Lite ($0.25/$1.50) compounds materially, and caching ($0.01) plus batch (~50%) push it lower still. The free AI Studio tier is generous for prototyping. For procurement it's the safest choice: GA, stable pricing, official migration target. The one finance caveat is the capability shortfall vs 3.1 Flash-Lite — if quality forces upgrades, you face a rebuy decision within 6-12 months. Audio at 3x text is a minor watch-item.
“Same SDK as the whole family, 1M context at the bottom of the price list, and it's blisteringly fast — just respect the ceiling.”
For builders, ergonomics are identical to the rest of Gemini — same SDK, same Vertex surface, same function-calling patterns — so migration off 2.0 Flash is a one-line change. The 1M context at this tier is genuinely unusual and useful. Output speed is excellent for streaming. The hard constraint is the capability ceiling: don't build serious coding or multi-hop agent workflows on Flash-Lite. For high-throughput classification, extraction, and transformation, the speed and price are class-leading and the reliability is GA-grade.
“I almost never see it directly — it's infrastructure — but where it surfaces it's fast and fine for simple questions.”
End users almost never meet 2.5 Flash-Lite directly; it lives inside apps and embedded assistants rather than as a Gemini app default. Where it surfaces, responses are fast and adequate for simple questions, but noticeably thinner than Pro or 3.5 Flash on anything complex. Refusals follow Google's standard policy stack. Its user-facing impact is essentially latency — where it shines — rather than depth. As bulk infrastructure rather than a headline consumer model, most users never consciously interact with it.
“A 47.4% GPQA tells you what this is — a cheap classifier, not a reasoner; the 1M context is its only headline that survives scrutiny.”
No overclaiming here, which is refreshing — Google positions it honestly as cheap and fast. The skeptical notes are about fit, not hype: GPQA 47.4% and AA Index 13 confirm it's a classifier-grade model, so treating it as a general assistant will disappoint. The 1M context is real but, as with the rest of the family, recall figures are unpublished — likely degrading earlier at this tier. The audio premium (3x) and lack of video quietly narrow its multimodal story. It's exactly what it claims to be; the only mistake a buyer can make is asking it to reason.
- High-volume classification and extraction (intent, sentiment, entity, routing). - Lightweight in-app chat assistants where cost-per-call dominates. - Bulk transformations: summarization, translation, reformatting. - Production migration off Gemini 2.0 Flash for cost-sensitive teams needing GA stability. - Multimodal triage (image moderation, audio routing) at scale.
Yes — GA since 2025-07-22, with no announced deprecation date as of 2026-05-28. It's the GA-stable choice in the Flash-Lite tier.
GA stability and lower price ($0.10/$0.40 vs $0.25/$1.50). 3.1 Flash-Lite reasons far better (GPQA 86.9% vs 47.4%) but is still preview.
Yes — it's Google's officially designated migration target ahead of the 2026-06-01 shutdown, and migration is a model-name swap.
Complex reasoning, serious coding, and multi-hop agents. It's a classifier/extraction/bulk-transformation engine, plus it lacks video input.
Yes — audio input is $0.30/1M (3x the $0.10 text rate).
No. Gemini is closed-weights, API/Vertex/OCI only.
Does not train on API inputs by default
Last verified 2026-05-27