Gemini 2.5 Flash-Lite

GA

by Google · Gemini 2.5 family · best for cheapest GA model with 1M context

Cost-OptimizedEdge / On-DeviceLong-Context
7.7
AI Panel Score
Value 9.5/10

Gemini 2.5 Flash-Lite is the lowest-cost GA model in Google's lineup — $0.10 in / $0.40 out per 1M tokens — built for ultra-high-volume, ultra-low-latency workloads while still carrying a full 1M-token context. GA on 2025-07-22, it is Google's officially designated migration target for teams leaving Gemini 2.0 Flash and 2.0 Flash-Lite before the 2026-06-01 shutdown. As of 2026-05-28 it remains the safe, cheap, GA-stable workhorse for classification, extraction, and bulk transformation. For a buyer: when cost-per-call and GA stability dominate and you don't need deep reasoning, this is the pick; if you can tolerate preview, 3.1 Flash-Lite reasons far better. - Provider: Google (DeepMind) - Released: 2025-07-22 (GA) - Status: GA - Context window: 1,048,576 tokens (1M) - Max output: 65,535 tokens - Modalities: text, image, audio in; text out (no video) - Knowledge cutoff: January 2025 - Headline price: $0.10 in / $0.40 out per 1M tokens

What's new

  • Lowest-cost GA model in the Gemini 2.5 family, optimized for ultra-low latency.
  • 1M-token context preserved despite the Flash-Lite tier.
  • Output token limit raised to ~65K (vs 8K on earlier Flash-Lite previews).
  • Multimodal input (text, image, audio).
  • Designated by Google as the official migration target for Gemini 2.0 Flash/Flash-Lite before the 2026-06-01 shutdown.

Benchmarks

BenchmarkScoreSource
MMLU-Pro72.4%artificialanalysis.ai 2025
GPQA Diamond47.4%artificialanalysis.ai 2025
Artificial Analysis Index13artificialanalysis.ai 2026-05-28T00:00:00.000Z

AI Panel Review

Six personas, six verdicts — the same panel that reviews every product on TopReviewed.

Decision Maker8/10
The safe, cheap, GA-stable workhorse — and the migration target Google itself names for 2.0 Flash.

For high-volume, low-complexity workloads where GA stability and rock-bottom price are the priority, 2.5 Flash-Lite is the right Google choice today. Vertex governance is identical to higher tiers, and it's the official migration target Google recommends for the 2026-06-01 Gemini 2.0 shutdown. The strategic decision is the trade vs 3.1 Flash-Lite: if you can tolerate preview, that model reasons far better at slightly higher cost; for mission-critical paths where preview isn't acceptable, 2.5 Flash-Lite wins on stability. Lock-in is the usual Google Cloud consideration.

Strategic Fit 8Vendor Risk 8Roadmap Confidence 8
Pros
  • Cheapest GA, stable, official 2.0 migration target, broad cloud availability
Cons
  • Weak reasoning
  • preview 3.1 Flash-Lite tempting
Right for: Mission-critical high-volume simple tasks
Avoid if: You need reasoning and can accept preview status
Domain Strategist7.5/10
It owns the floor — cheapest GA model with a 1M context — a defensible niche even as 3.1 Flash-Lite looms.

Strategically, 2.5 Flash-Lite owns the value floor: the cheapest GA model in the lineup that still carries a 1M context. Its competitive position is stability plus price for bulk infrastructure, not capability. Differentiation is the GA label and multi-cloud reach (including OCI) at a moment when the stronger 3.1 Flash-Lite is still preview. Market timing is favorable thanks to the 2.0 deprecation wave pushing migrations its way. The risk is that once 3.1 Flash-Lite reaches GA, this model's niche narrows to pure cost-minimization.

Competitive Positioning 8Differentiation 7Market Timing 8
Pros
  • Owns the GA value floor, multi-cloud, migration-wave tailwind
Cons
  • Niche narrows once 3.1 Flash-Lite is GA
Right for: Bulk-infrastructure buyers wanting GA + lowest price
Avoid if: You want capability headroom
Finance Lead9/10
The cheapest GA model in the lineup — at $0.10/$0.40, high-volume unit economics basically disappear as a concern.

This is the cost leader among GA models — $0.10 input, $0.40 output. For pipelines processing millions of items per day, the gap vs 3.1 Flash-Lite ($0.25/$1.50) compounds materially, and caching ($0.01) plus batch (~50%) push it lower still. The free AI Studio tier is generous for prototyping. For procurement it's the safest choice: GA, stable pricing, official migration target. The one finance caveat is the capability shortfall vs 3.1 Flash-Lite — if quality forces upgrades, you face a rebuy decision within 6-12 months. Audio at 3x text is a minor watch-item.

Cost Efficiency 10Pricing Transparency 9Value per Dollar 10
Pros
  • Cheapest GA, deep cache discount, stable pricing
Cons
  • Capability shortfall may force a rebuy
  • audio premium
Right for: Million-item-per-day pipelines
Avoid if: Quality needs will push you up the stack anyway
Domain Practitioner8/10
Same SDK as the whole family, 1M context at the bottom of the price list, and it's blisteringly fast — just respect the ceiling.

For builders, ergonomics are identical to the rest of Gemini — same SDK, same Vertex surface, same function-calling patterns — so migration off 2.0 Flash is a one-line change. The 1M context at this tier is genuinely unusual and useful. Output speed is excellent for streaming. The hard constraint is the capability ceiling: don't build serious coding or multi-hop agent workflows on Flash-Lite. For high-throughput classification, extraction, and transformation, the speed and price are class-leading and the reliability is GA-grade.

API Ergonomics 8Tool/Agent Support 7Reliability 9
Pros
  • Zero-migration SDK, 1M context, very fast, GA-reliable
Cons
  • Low capability ceiling
Right for: High-throughput simple pipelines
Avoid if: You need coding or complex agents
Power User7/10
I almost never see it directly — it's infrastructure — but where it surfaces it's fast and fine for simple questions.

End users almost never meet 2.5 Flash-Lite directly; it lives inside apps and embedded assistants rather than as a Gemini app default. Where it surfaces, responses are fast and adequate for simple questions, but noticeably thinner than Pro or 3.5 Flash on anything complex. Refusals follow Google's standard policy stack. Its user-facing impact is essentially latency — where it shines — rather than depth. As bulk infrastructure rather than a headline consumer model, most users never consciously interact with it.

Output Quality 6Speed 9Everyday Usefulness 7
Pros
  • Very fast, fine for simple tasks, current via grounding
Cons
  • Thin on complex tasks
  • not a consumer default
Right for: Embedded fast assistants
Avoid if: You want depth per answer
Skeptic7/10
A 47.4% GPQA tells you what this is — a cheap classifier, not a reasoner; the 1M context is its only headline that survives scrutiny.

No overclaiming here, which is refreshing — Google positions it honestly as cheap and fast. The skeptical notes are about fit, not hype: GPQA 47.4% and AA Index 13 confirm it's a classifier-grade model, so treating it as a general assistant will disappoint. The 1M context is real but, as with the rest of the family, recall figures are unpublished — likely degrading earlier at this tier. The audio premium (3x) and lack of video quietly narrow its multimodal story. It's exactly what it claims to be; the only mistake a buyer can make is asking it to reason.

Claim Accuracy 8Weakness Severity 6Hype vs Reality 8
Pros
  • Honestly positioned, genuinely cheap and fast, real 1M context
Cons
  • Classifier-grade reasoning
  • unpublished recall
  • audio premium
  • no video
Right for: Skeptics who scope it to bulk simple tasks
Avoid if: You expect general-assistant quality

Strengths

  • Lowest GA price in the Gemini lineup ($0.10/$0.40).
  • 1M context preserved at the Flash-Lite tier.
  • 65K output tokens — usable for long-form drafts.
  • Production-stable; Google's official Gemini 2.0 migration target.
  • Among the fastest GA models (up to ~887 tok/s burst); multimodal input including audio.

Limitations

  • Reasoning is meaningfully below 3.1 Flash-Lite — GPQA 47.4% vs 86.9%.
  • Not suitable for complex agent loops or serious coding.
  • No video input (unlike the rest of the family).
  • January 2025 knowledge cutoff.
  • Output tone is workmanlike, not creative.
  • Audio input bills at 3x text ($0.30 vs $0.10).

Best use cases

- High-volume classification and extraction (intent, sentiment, entity, routing). - Lightweight in-app chat assistants where cost-per-call dominates. - Bulk transformations: summarization, translation, reformatting. - Production migration off Gemini 2.0 Flash for cost-sensitive teams needing GA stability. - Multimodal triage (image moderation, audio routing) at scale.

Buyer questions

Is it generally available?

Yes — GA since 2025-07-22, with no announced deprecation date as of 2026-05-28. It's the GA-stable choice in the Flash-Lite tier.

Why pick it over 3.1 Flash-Lite?

GA stability and lower price ($0.10/$0.40 vs $0.25/$1.50). 3.1 Flash-Lite reasons far better (GPQA 86.9% vs 47.4%) but is still preview.

Should I use it to migrate off Gemini 2.0 Flash?

Yes — it's Google's officially designated migration target ahead of the 2026-06-01 shutdown, and migration is a model-name swap.

What can't it do well?

Complex reasoning, serious coding, and multi-hop agents. It's a classifier/extraction/bulk-transformation engine, plus it lacks video input.

Does audio cost extra?

Yes — audio input is $0.30/1M (3x the $0.10 text rate).

Can I self-host?

No. Gemini is closed-weights, API/Vertex/OCI only.

Comparable models

**Gemini 3.1 Flash-Lite (preview)** — Far stronger reasoning (GPQA 86.9% vs 47.4%) at $0.25/$1.50; preview status is the main hesitation vs this GA-stable model.
**Gemini 2.0 Flash-Lite** — The predecessor, reaching EOL 2026-06-01; 2.5 Flash-Lite is the official replacement Google designates.
**GPT-5.4 nano / Claude Haiku 4.5** — Comparable cheap tier; weaker multimodal and smaller context, and no Search grounding. 2.5 Flash-Lite wins on context size and price; the others may edge it on ecosystem fit.

Model specs

Input price
$0.10 / Mtok
Output price
$0.40 / Mtok
Cached input
$0.01 / Mtok
Batch (in/out)
$0.05 / $0.20
Context window
1.0M tokens
Max output
66K tokens
Knowledge cutoff
2025-01
Released
2025-07-21
Modalities
text, image, audio → text
Output speed
~215 tok/s
License
Proprietary
Clouds
Vertex AI, GCP, OCI

Does not train on API inputs by default

Last verified 2026-05-27