The Number on the Pricing Page Is the Wrong Number
Here is why. An input token is something the model reads; an output token is something the model writes. Every provider charges more for output, because writing is more expensive to compute than reading. The gap is neither small nor constant: across the newest models to land since late April — Claude Opus 4.8 on May 28, plus GPT-5.5, Gemini 3.5 Flash, Grok 4.3, and DeepSeek V4 — output costs anywhere from 2× to 6× as much as input.
That spread means two models with nearly identical input prices can hand you bills that differ threefold, depending entirely on how much your application writes versus reads. If you are ranking models by input price alone, you are comparing them on the axis that matters least.
Why Output Costs More Than Input
What physics does not fix is the size of the markup. The underlying cost gap is real, but whether a provider charges 2× or 6× for output is a strategic choice — shaped by their hardware, their batching efficiency, and how badly they want output-heavy traffic. That is why the multiplier varies so widely across the 2026 lineup, and why it has quietly become a competitive battleground. A provider pricing output at 2× is making a play for exactly the agent and reasoning workloads that everyone else is overcharging.
The June 2026 Lineup, Sorted by What Actually Costs You
Two camps jump out. DeepSeek V4 Pro and Grok 4.3 hold output to 2× input. Claude Opus 4.8 sits at 5×, keeping its standard rate at $5 / $25 per million while cutting its priority Fast tier threefold versus Opus 4.7. And the 6× camp — Gemini 3.5 Flash, GPT-5.4, and GPT-5.5 — charges six dollars of output for every dollar of input.
Watch what that does to two models that look like twins on a pricing page. Grok 4.3 lists at $1.25 input; Gemini 3.5 Flash at $1.50 — twenty percent apart, close enough that most teams would call it a wash. Run an output-heavy agent on each and Grok costs $2.25 per million blended while Gemini costs $7.50 — more than three times as much. Same near-identical headline, triple the bill.
Your Blended Rate Is the Only Rate That Matters
The weights come from your workload's input-to-output ratio, and they change everything. Three rough archetypes cover most applications. Retrieval and extraction — stuffing a large context in to get a short answer out — runs heavily input-side, often 10-to-1 or more; here the output multiplier barely registers and input price genuinely is king. Conversational chat lands near 1-to-1. Agents, reasoning, and code generation invert the ratio entirely, writing far more than they read — 1-to-4 or steeper — and at that mix the output multiplier dominates the bill.
You do not have to guess your ratio. Pull it from a week of production logs, or estimate it with the API cost estimator, which runs the blend across your real token mix.
The Same Models, Two Workloads, Two Different Winners
Now keep the same models and flip the workload to retrieval: each task reads 50,000 tokens and writes 500. The bills converge. Opus 4.8 lands at $26,250 and GPT-5.5 at $26,500 — the multiplier that split them by sixteen percent on agents now barely moves the total, because almost every token is input. Grok's lead over Opus shrinks from roughly ninefold on the agent workload to fourfold on retrieval.
The models never changed. Your token mix did. That is the whole point: there is no context-free 'cheapest model,' only the cheapest model for your ratio.
Reasoning Tokens Are Output Tokens (and They're Invisible)
This quietly drags nearly every serious workload toward the output-heavy end of the spectrum, which is exactly where the multiplier bites hardest. Turn on extended thinking and a model that looked balanced on paper starts spending like an agent. It is why a model's output multiplier and its reasoning behavior have to be read together — a steep multiplier on a heavy reasoner compounds fast. For when that tradeoff pays off anyway, see are reasoning models worth the extra cost; for the mechanics of the ratio itself, see optimizing your input-to-output token ratio.
How to Shop for Rates in the Second Half of 2026
Two more habits. First, re-check after every major release: the multiplier is now a competitive lever, so a new model can reset your blended rate overnight without ever touching its headline input price. Second, remember that prompt caching pushes this math in the same direction — caching discounts input, which shrinks the input share of your bill and makes the output multiplier matter more, not less. Line the current lineup up side by side, input and output both visible, in the compare prices grid, and start the comparison from the column everyone else ignores. For the underlying mechanics, see output token pricing explained.