Guide · Fundamentals5 min read

How AI API Pricing Works

A practical guide to understanding per-token pricing, input vs output costs, and how to estimate your AI API bill before it arrives.

By Elliott Crosby · Published January 20, 2026 · Updated May 22, 2026

TL;DR

AI APIs price by the token, quoted as $/1 million tokens. Input tokens (the text you send) and output tokens (the text the model generates) are billed separately, with output costing 3–5× more. As of May 2026, Claude Sonnet 4 charges $3/1M input and $15/1M output — meaning a 1,000-input/500-output request costs about $0.0105. Batch APIs (Anthropic, OpenAI) cut that 50%, and prompt caching cuts cached input tokens by ~90%.

The Per-Token Model

Almost every major AI provider prices their API by the token. The standard unit is "per 1 million tokens" — usually written as $/1M tokens or $/MTok.

For example, Claude Sonnet 4 costs: • $3.00 per 1M input tokens • $15.00 per 1M output tokens

If you send a 1,000-token prompt and receive a 500-token response, you pay: (1,000 × $3 / 1,000,000) + (500 × $15 / 1,000,000) = $0.003 + $0.0075 = $0.0105 — just over one cent.

Input vs Output Pricing

Every model charges differently for input (text you send) and output (text the model generates). Output tokens are almost always more expensive — typically 3–5× the input price. This is because generating tokens is computationally more intensive than reading them.

This asymmetry matters for how you design prompts: • Long system prompts with detailed instructions → more input tokens • Asking for verbose, detailed answers → more output tokens • Caching repeated prompt sections → can dramatically cut input costs

Context Window and Costs

The context window limits how much text a model can process at once — and it directly affects your bill. If you send a 100,000 token document plus a 500 token question, you pay for 100,500 input tokens. With Gemini 2.5 Pro at $1.25/1M, that's $0.13 per query.

For long-context tasks, model choice matters enormously. A cheaper model with a large context window can be dramatically more affordable than a premium model.

Estimating Your Monthly Bill

To estimate costs: 1. Estimate average tokens per request (input + output separately) 2. Multiply by your expected request volume per month 3. Apply the per-token rate

Example: A chatbot that processes 500 input tokens and generates 300 output tokens per conversation, running 10,000 conversations/month with Claude Haiku 4: • Input: 500 × 10,000 × ($0.25/1M) = $1.25 • Output: 300 × 10,000 × ($1.25/1M) = $3.75 • Monthly total: ~$5.00

Use the TokenRate calculator to model this instantly for any model.

Hidden Costs to Watch For

• System prompts count toward every request — a 2,000-token system prompt on 1M requests is 2B extra tokens • Conversation history grows with each turn — consider summarizing or truncating • Retries and errors still consume tokens • Some providers charge for cached tokens at a discounted rate (Anthropic: 10% of base price) • Batch APIs (Anthropic, OpenAI) offer 50% discounts for non-real-time processing

Primary sources

Anthropic — API pricing — Source of Claude Sonnet 4 $3/$15 reference price.
OpenAI — API pricing
Google — Gemini API pricing
Anthropic — Batch API (50% discount)
OpenAI — Batch API (50% discount)

Frequently Asked Questions

Why are output tokens more expensive than input tokens?

Generating tokens is computationally heavier than reading them — the model runs a full forward pass for each output token. Output is typically 3–5× the input price across Claude, GPT, and Gemini.

What's the difference between $/1K tokens and $/1M tokens?

They're the same unit at different scales. $3/1M tokens = $0.003/1K tokens. The industry has standardized on $/1M tokens because per-thousand pricing produced unreadably small numbers as models got cheaper.

Do system prompts count toward my token bill?

Yes — every token you send, including system prompts, counts as input. A 2,000-token system prompt sent across 1M requests adds 2 billion input tokens to your bill. Caching or trimming these is one of the highest-leverage optimizations.

Try the TokenRate Calculator

Estimate your monthly AI bill across Claude, GPT-4o, Gemini, and more — paste your typical input/output sizes and see live pricing instantly.

Open Calculator →