A practical guide to understanding per-token pricing, input vs output costs, and how to estimate your AI API bill before it arrives.
Published · Updated
TL;DR
AI APIs price by the token, quoted as $/1 million tokens. Input tokens (the text you send) and output tokens (the text the model generates) are billed separately, with output costing 3–5× more. As of May 2026, Claude Sonnet 4 charges $3/1M input and $15/1M output — meaning a 1,000-input/500-output request costs about $0.0105. Batch APIs (Anthropic, OpenAI) cut that 50%, and prompt caching cuts cached input tokens by ~90%.
The Per-Token Model
Almost every major AI provider prices their API by the token. The standard unit is "per 1 million tokens" — usually written as $/1M tokens or $/MTok.
For example, Claude Sonnet 4 costs:
• $3.00 per 1M input tokens
• $15.00 per 1M output tokens
If you send a 1,000-token prompt and receive a 500-token response, you pay: (1,000 × $3 / 1,000,000) + (500 × $15 / 1,000,000) = $0.003 + $0.0075 = $0.0105 — just over one cent.
Input vs Output Pricing
Every model charges differently for input (text you send) and output (text the model generates). Output tokens are almost always more expensive — typically 3–5× the input price. This is because generating tokens is computationally more intensive than reading them.
This asymmetry matters for how you design prompts:
• Long system prompts with detailed instructions → more input tokens
• Asking for verbose, detailed answers → more output tokens
• Caching repeated prompt sections → can dramatically cut input costs
Context Window and Costs
The context window limits how much text a model can process at once — and it directly affects your bill. If you send a 100,000 token document plus a 500 token question, you pay for 100,500 input tokens. With Gemini 2.5 Pro at $1.25/1M, that's $0.13 per query.
For long-context tasks, model choice matters enormously. A cheaper model with a large context window can be dramatically more affordable than a premium model.
Estimating Your Monthly Bill
To estimate costs:
1. Estimate average tokens per request (input + output separately)
2. Multiply by your expected request volume per month
3. Apply the per-token rate
Example: A chatbot that processes 500 input tokens and generates 300 output tokens per conversation, running 10,000 conversations/month with Claude Haiku 4:
• Input: 500 × 10,000 × ($0.25/1M) = $1.25
• Output: 300 × 10,000 × ($1.25/1M) = $3.75
• Monthly total: ~$5.00
• System prompts count toward every request — a 2,000-token system prompt on 1M requests is 2B extra tokens
• Conversation history grows with each turn — consider summarizing or truncating
• Retries and errors still consume tokens
• Some providers charge for cached tokens at a discounted rate (Anthropic: 10% of base price)
• Batch APIs (Anthropic, OpenAI) offer 50% discounts for non-real-time processing
Why are output tokens more expensive than input tokens?
Generating tokens is computationally heavier than reading them — the model runs a full forward pass for each output token. Output is typically 3–5× the input price across Claude, GPT, and Gemini.
What's the difference between $/1K tokens and $/1M tokens?
They're the same unit at different scales. $3/1M tokens = $0.003/1K tokens. The industry has standardized on $/1M tokens because per-thousand pricing produced unreadably small numbers as models got cheaper.
Do system prompts count toward my token bill?
Yes — every token you send, including system prompts, counts as input. A 2,000-token system prompt sent across 1M requests adds 2 billion input tokens to your bill. Caching or trimming these is one of the highest-leverage optimizations.
Try the TokenRate Calculator
Estimate your monthly AI bill across Claude, GPT-4o, Gemini, and more — paste your typical input/output sizes and see live pricing instantly.