How Anthropic structures its pricing
The 5x rule is the single most useful thing to memorize, because it means verbose responses dominate your bill long before big prompts do. I cover why this matters so much in the output multiplier piece.
* The per-message figure in the table above assumes 1,500 input tokens and 400 output tokens — a realistic chat turn with a system prompt and some history.
The four tiers, and who each one is for
Sonnet 4.6 ($3 / $15, 1M context) is the default I recommend when someone asks where to start. It handles long documents, serious coding, and multi-step reasoning at a fifth the price of Fable 5.
Opus 4.8 ($5 / $25, 1M context) sits in a strange and useful spot: frontier-class quality at half the price of Fable 5. On the Arena leaderboard it scores within a point of GPT-5.5 while costing the same on input and less on output.
Fable 5 ($10 / $50, 1M context) is Anthropic's newest flagship. You pay for the ceiling: the hardest reasoning, agentic, and long-horizon tasks. If your task doesn't visibly fail on Opus 4.8, Fable 5 is probably overkill — that's not a knock, it's just where the price-performance curve bends.
Opus 4.8 Fast: paying for speed, not smarts
When I priced this out for interactive use cases, the math came down to one question: does a human sit there waiting for the response? For a coding assistant or a live chat UI, latency is the product, and 2x on a few cents is nothing. For a batch summarization pipeline running overnight, paying double for speed nobody perceives is pure waste. There's a longer comparison in the Grok Build vs Opus 4.8 Fast piece.
The discounts: caching and batch
Prompt caching lets you mark a stable prefix (system prompt, tool definitions, a big document) so repeat requests read it at a fraction of the input price instead of paying full rate every time. If you run a chatbot that resends an 800-token system prompt on every one of a million daily requests, caching is the difference between paying for 800 billion tokens a month and paying a small fraction of that. Details and worked math in the prompt caching guide.
The Batch API takes 50% off both input and output for any work that can tolerate up to 24 hours of turnaround — evals, backfills, nightly reports. Half off for changing a deadline is the easiest discount in the industry; I walk through it in the batch API guide.
What real workloads cost on Claude
A support chatbot handling 10,000 conversations a month (roughly 9,000 cumulative input tokens and 1,200 output tokens per conversation, since history gets resent every turn): about $150/month on Haiku 4.5, $450 on Sonnet 4.6, $750 on Opus 4.8.
Summarizing 1,000 long documents (50,000 tokens in, 1,000 out each): about $165 on Sonnet 4.6 ($150 of input plus $15 of output) — or roughly $82 with batch pricing.
A heavy agentic coding session that burns 2M input and 150K output tokens: $8.25 on Sonnet 4.6, $13.75 on Opus 4.8, $27.50 on Fable 5. Run twenty of those a month and model choice is suddenly a nearly $400/month decision.
Extended thinking deserves its own line item: thinking tokens bill as output, and on hard problems they can multiply the response cost several times over. I've broken that down in the extended thinking cost analysis.
How Claude prices compare to the field
The honest summary: Anthropic is no longer the premium-priced option it was in 2024. Opus 4.8 at $5 / $25 is one of the better frontier-class deals on the market right now, and the lineup's uniform 5x output multiplier makes bills easier to predict than most. For the full cross-provider picture, see the provider comparison and the live pricing table.