Gemini 3.5 Flash vs 3.1 Flash Lite: When 'Flash' Stopped Meaning Cheap
Gemini 3.5 Flash costs $1.50/$9.00 per 1M tokens — 3× pricier than its predecessor. Compare it to 3.1 Flash Lite at $0.25/$1.50 and Claude Opus 4.8 Fast.
Published
TL;DR
Gemini 3.5 Flash at $1.50 input / $9.00 output is no longer a budget model — its 6× output multiplier pushes agent workloads to $7.50 per 1M blended tokens. Gemini 3.1 Flash Lite at $0.25 / $1.50 is now Google's only genuinely cheap option, blending to just $1.25 at agent ratios. Check the output multiplier and your actual workload mix before trusting any 'Flash' or 'Fast' label.
Live token rates via OpenRouter, sorted by output multiplier (output ÷ input).
| Model | Input / 1M | Output / 1M | Out ÷ In | Context | Quality |
|---|---|---|---|---|---|
| Claude Opus 4.8 (Fast) | $10.00 | $50.00 | 5.0× | 1M | — |
| Claude Opus 4.7 (Fast) | $30.00 | $150 | 5.0× | 1M | 76 |
| Gemini 3.5 Flash | $1.50 | $9.00 | 6.0× | 1M | 73 |
| Gemini 3.1 Flash Lite | $0.250 | $1.50 | 6.0× | 1M | — |
Primary sources
- OpenRouter — live model pricing — Input/output price per token and context length for every model in the table
- LMArena leaderboard — Crowd-sourced Elo, normalised to a 0–100 quality score
- Google — Gemini API pricing — Official rate card
- Anthropic — pricing — Official rate card
- TokenRate — compare prices — Live input and output rates side by side
Frequently Asked Questions
What is the blended cost of Gemini 3.5 Flash for an agent workload?
At a 1:4 input-to-output ratio typical of agent and reasoning workloads, Gemini 3.5 Flash blends to $7.50 per million tokens. This is driven by the $9.00 per million output price and the 6.0× output multiplier, which makes long generations significantly more expensive than the $1.50 input price alone suggests.
How much cheaper is Gemini 3.1 Flash Lite than Gemini 3.5 Flash in production?
At a 1:4 agent ratio, Gemini 3.1 Flash Lite blends to $1.25 per million tokens versus $7.50 for Gemini 3.5 Flash — a 6× cost difference. At a 9:1 retrieval ratio the gap narrows in absolute terms to $0.375 versus $2.25, but the relative difference remains 6×.
How much did Claude Opus 4.8 Fast reduce costs compared to Opus 4.7 Fast?
Claude Opus 4.7 Fast had an input price of $30.00 and output of $150.00 per million tokens, blending to $126.00 at agent ratios. Claude Opus 4.8 Fast brought that to $10.00 input, $50.00 output, and $42.00 blended at agent ratios — roughly a 3× reduction across all workload shapes.
Which Gemini model should I use for high-volume retrieval-augmented generation?
Gemini 3.1 Flash Lite is the correct choice for high-volume retrieval workloads. At a 9:1 input-to-output ratio it blends to $0.375 per million tokens, compared to $2.25 for Gemini 3.5 Flash. Both share a 1M token context window, so there is no structural disadvantage to using Lite for retrieval-heavy pipelines.
Do Gemini 3.5 Flash and Gemini 3.1 Flash Lite have the same output multiplier?
Yes, both carry a 6.0× output multiplier, meaning output tokens cost six times more than input tokens on each model. The multiplier is identical, but because the base rates differ substantially, the absolute cost impact is far larger for 3.5 Flash than for 3.1 Flash Lite on any generation-heavy workload.
Try the TokenRate Calculator
Run your own token mix through the numbers at tokenrate.dev/tools/compare-prices to see exactly which model wins at your specific input-to-output ratio before you commit to a production integration.
Open Calculator →Related Reading