The headline: Google is underpricing the competition
Read that again: the model with the highest score of the three costs 40% as much on input. Whatever you think of any single leaderboard — and I've written about their limits — the pricing asymmetry is real and large. Google is buying market share, and API consumers are the beneficiaries.
The per-message column above assumes 1,500 input tokens and 400 output tokens, my standard chat-turn yardstick across these guides.
Gemini 3.5 Flash: the anomaly worth knowing about
That makes Flash the model I check first whenever someone asks what to use for serious-but-high-volume work: drafting, code review comments, RAG answer synthesis, long-document Q&A. The catch I've found in practice is variance — Flash is more sensitive to prompt quality than the Pro tiers, so invest in your system prompt before judging it. The Flash vs Flash-Lite comparison covers when to step down a tier.
Flash-Lite: the volume tier
One thing Flash-Lite shares with its bigger siblings: the full multimodal stack. If your volume workload involves images — receipts, screenshots, forms — Gemini's vision pricing at this tier has been the cheapest path I've found. More on that in the multimodal token costs piece.
Context windows and long-document economics
A conversation that keeps a 200K-token document in context for 20 turns pays for roughly 4M input tokens of resends — $8 on Pro, $6 on Flash — unless you use context caching, which Google offers precisely for this pattern. The general trap and the fixes are covered in the 1M-token context piece; the short version is that retrieval (sending only relevant chunks) usually beats brute-force stuffing by 10-50x on cost.
What real workloads cost on Gemini
A support chatbot, 10,000 conversations a month (9,000 cumulative input tokens, 1,200 output per conversation): about $243/month on Flash, $40 on Flash-Lite, $324 on 3.1 Pro.
Summarizing 1,000 long documents (50,000 tokens in, 1,000 out): $75 on Flash, $100 on Pro.
A RAG product answering 100,000 questions a month (3,000 tokens of retrieved context in, 300 out each): $450 input + $270 output = $720/month on Flash — versus $1,500+ on GPT-5.5 input alone. For value-tier math across providers, see the quality-per-dollar ranking.
The honest caveats
Second, quotas: the attractive list prices come with per-minute token limits that need a quota request to lift at scale; budget lead time for that conversation.
Third, ecosystem lock-in points the other way: the discounts get deeper inside Vertex AI, which is great if you're already on Google Cloud and friction if you're not. None of these flip the conclusion — Gemini is the value play of mid-2026 — but they're the fine print I'd want stated. Cross-check current numbers on the live pricing table before committing.