How much does the Gemini API cost?

As of June 2026: Gemini 3.1 Pro costs $2 per million input tokens and $12 per million output tokens; Gemini 3.5 Flash costs $1.50 / $9; Gemini 3.1 Flash-Lite costs $0.25 / $1.50. A typical chat message costs between a tenth of a cent (Flash-Lite) and eight-tenths of a cent (Pro).

Is Gemini cheaper than GPT and Claude?

At the frontier tier, substantially: Gemini 3.1 Pro ($2/$12) versus GPT-5.5 ($5/$30) and Claude Opus 4.8 ($5/$25), while scoring slightly higher on the Arena leaderboard (75 vs 74 for both rivals).

Is Gemini 3.5 Flash good enough for production?

For most workloads, yes — it scores 73 on Arena, level with GPT-5.4 and within two points of the frontier models, at $1.50/$9 per million tokens. It's more prompt-sensitive than the Pro tier, so test with your real prompts before deciding.

Does a 1M-token context window cost extra on Gemini?

There's no surcharge for the window itself — you pay normal per-token rates for whatever you actually send. The trap is resending: a full 1M-token context costs about $2 per request on 3.1 Pro, every request. Context caching and retrieval are the standard fixes.

Gemini API Pricing in 2026: Why Google Is the Value Play

Model	Input / 1M	Output / 1M	Arena score	Cost per chat message*
Gemini 3.5 Flash	$1.50	$9.00	73	$0.0059
Gemini 3.1 Flash-Lite	$0.25	$1.50	—	$0.001
Gemini 3.1 Pro	$2.00	$12.00	75	$0.0078

The headline: Google is underpricing the competition

When I re-verified prices on June 10, the cross-provider picture at the frontier looked like this: Gemini 3.1 Pro at $2 in / $12 out, Claude Opus 4.8 at $5 / $25, GPT-5.5 at $5 / $30 per million tokens. On the Arena leaderboard those three score 75, 74, and 74 respectively.

Read that again: the model with the highest score of the three costs 40% as much on input. Whatever you think of any single leaderboard — and I've written about their limits — the pricing asymmetry is real and large. Google is buying market share, and API consumers are the beneficiaries.

The per-message column above assumes 1,500 input tokens and 400 output tokens, my standard chat-turn yardstick across these guides.

Gemini 3.5 Flash: the anomaly worth knowing about

Flash is normally a provider's 'fast and cheap' tier — useful, unglamorous. Gemini 3.5 Flash breaks that pattern: it scores 73 on Arena, level with GPT-5.4 ($2.50 / $15) and within two points of the frontier trio, while costing $1.50 / $9.

That makes Flash the model I check first whenever someone asks what to use for serious-but-high-volume work: drafting, code review comments, RAG answer synthesis, long-document Q&A. The catch I've found in practice is variance — Flash is more sensitive to prompt quality than the Pro tiers, so invest in your system prompt before judging it. The Flash vs Flash-Lite comparison covers when to step down a tier.

Flash-Lite: the volume tier

Gemini 3.1 Flash-Lite at $0.25 / $1.50 competes with GPT-5.4-nano ($0.20 / $1.25) for classification, routing, tagging, and extraction. At a tenth of a cent per typical message, the pricing question stops mattering and the engineering question takes over: which one holds your output schema more reliably at your volume. Run a few hundred real examples through both before committing — at this tier the eval costs pennies.

One thing Flash-Lite shares with its bigger siblings: the full multimodal stack. If your volume workload involves images — receipts, screenshots, forms — Gemini's vision pricing at this tier has been the cheapest path I've found. More on that in the multimodal token costs piece.

Context windows and long-document economics

Every current Gemini model ships roughly a 1M-token context window. The marketing writes itself — drop a whole codebase in — but the economics deserve sober math: filling a 1M context on 3.1 Pro costs about $2 per request, every request, because you pay for input tokens every time you send them.

A conversation that keeps a 200K-token document in context for 20 turns pays for roughly 4M input tokens of resends — $8 on Pro, $6 on Flash — unless you use context caching, which Google offers precisely for this pattern. The general trap and the fixes are covered in the 1M-token context piece; the short version is that retrieval (sending only relevant chunks) usually beats brute-force stuffing by 10-50x on cost.

What real workloads cost on Gemini

My standard scenarios, priced at June 2026 rates with the calculator.

A support chatbot, 10,000 conversations a month (9,000 cumulative input tokens, 1,200 output per conversation): about $243/month on Flash, $40 on Flash-Lite, $324 on 3.1 Pro.

Summarizing 1,000 long documents (50,000 tokens in, 1,000 out): $75 on Flash, $100 on Pro.

A RAG product answering 100,000 questions a month (3,000 tokens of retrieved context in, 300 out each): $450 input + $270 output = $720/month on Flash — versus $1,500+ on GPT-5.5 input alone. For value-tier math across providers, see the quality-per-dollar ranking.

The honest caveats

Three things temper the enthusiasm. First, model churn: Google iterates fast and deprecates fast, and several strong Gemini variants ship as previews — fine for experiments, riskier as a hard production dependency without a migration plan.

Second, quotas: the attractive list prices come with per-minute token limits that need a quota request to lift at scale; budget lead time for that conversation.

Third, ecosystem lock-in points the other way: the discounts get deeper inside Vertex AI, which is great if you're already on Google Cloud and friction if you're not. None of these flip the conclusion — Gemini is the value play of mid-2026 — but they're the fine print I'd want stated. Cross-check current numbers on the live pricing table before committing.