Output Speed Across LLMs: Compare Prices Grid Notes

Why Output Speed Matters in a Cost Comparison

Once you've narrowed a model shortlist on the main TokenRate calculator, the Compare Prices side-by-side view is where you stack them for a decision. Each row shows the provider, the model ID (the one you'd paste into your SDK), per-1M input and output costs, the context window, and the blended quality score. Most teams compare LLMs on per-token input price and call it a day. Output Speed is one of the under-attended dimensions that shows up in the Compare Prices grid — and ignoring it is where the surprise bills come from. Output speed (tokens per second) isn't in the Compare Prices grid columns — but it's correlated with model size and tier in ways that matter for routing decisions. Related reading: quality per dollar LLM ranking 2026, LLM color-coded quality badges explained, and why the cheapest LLM isn't always the best value.

How Output Speed Appears in the Grid

In the grid, the Fast tier is the rough proxy for speed: smaller models output more tokens per second on the same hardware. Haiku 4.5, Gemini Flash, GPT-4o mini — all output 100+ tps in practice.

Reading the Spread

Reasoning models (o3, R1, Sonnet thinking) are intentionally slower because chain-of-thought adds compute. The Compare Prices grid groups them in the Reasoning tier so the speed expectation is calibrated.

Practical Implications

For applications where output speed matters (live chat, real-time agents), the right Compare Prices workflow is: filter to Fast tier first, then compare pricing within the tier. Don't pick a Reasoning model for a streaming chatbot.

Workflow: From Grid to Decision

For up-to-date throughput numbers, cross-reference Artificial Analysis's speed leaderboard alongside the Compare Prices grid.

Frequently Asked Questions

How do I open the Compare Prices grid?

Two ways: click the 'Compare Prices' tab at the top of the calculator card on the home page, or navigate directly to /tools/compare-prices. The standalone page is also linked from the main navigation under 'Tools'.

Can I share my comparison with teammates?

Yes — the page URL captures the current state. Send the link in Slack and your teammate sees the same grid. Useful for procurement and architecture-review meetings.

Is the data live or cached?

Live from OpenRouter (prices) and a blended Arena AI + Artificial Analysis pipeline (quality), refreshed on a 60-minute incremental cache. So the grid is at most an hour stale.

Where do I go after the grid to project monthly cost?

Once you've picked a winner, go to /tools/api-cost-estimator and plug in the model + your expected monthly token volume. The estimator does the per-1M math against your real workload mix.