Why I price budgets in blended cost
The formula is simple: blended = 0.75 x input price + 0.25 x output price. Claude Haiku 4.5 at $1 in / $5 out blends to $2 per million. Your $50 buys 25 million tokens.
If your workload is output-heavy (content generation) or input-heavy (document analysis), shift the ratio and the rankings move — Grok's cheap output makes it climb for generation work, while cheap-input models climb for RAG. The asterisk on the table is doing real work; the calculator lets you set your own ratio. For why the input-output distinction exists at all, see how AI API pricing works.
What those token numbers mean physically
This is the most underrated fact in AI budgeting: for personal projects, prototypes, and small products, a $50/month budget on a sensibly chosen model is effectively unlimited. The conversion math is in tokens to dollars and how many tokens in 1,000 words.
Three realistic $50 builds
Build two: a newsletter summarizer that ingests 200 articles a week (3,000 tokens each) and writes 200-token summaries. Weekly: 600K in, 40K out. Monthly: about 2.6M tokens — $5.20 on Haiku, $1.45 on Flash-Lite. You're using a tenth of the budget.
Build three: a niche RAG chatbot doing 5,000 questions a month (3,000 tokens of context in, 300 out each): 15M in + 1.5M out monthly. On Gemini 3.5 Flash that's $22.50 + $13.50 = $36 — fits with room to spare; on Sonnet it's $45 + $22.50 = $67.50 and you're over budget. At fixed budgets, the model choice IS the product decision.
Stretching the budget: the multipliers stack
Prompt caching: chat and RAG workloads resend the same prefixes constantly; cached input bills at roughly a tenth of list rate. On input-heavy workloads (which is most of them — see the 3:1 mix) caching alone can nearly double your effective budget. Details in the caching guide.
Batch APIs: anything that can wait a few hours gets a flat 50% off on Anthropic and OpenAI. The summarizer in build two drops to $2.60/month.
Routing: send the easy 80% of requests to a budget model and reserve the mid tier for the hard 20%. A Flash-Lite-plus-Sonnet cascade typically lands at a quarter of all-Sonnet cost at near-identical perceived quality — the approach from the routing piece. Stack all three and the RAG bot from build three runs comfortably under $15.
When $50 stops being enough
The math is linear: the RAG bot at $36/month for 5,000 questions becomes $360 at 50,000 and $3,600 at 500,000. The thresholds where teams typically re-architect: around $500/month is where caching and routing stop being optional; around $2,000/month is where serious eval-driven model selection pays for itself (a 20% saving covers real engineering time); around $10,000/month is where open-weight models on dedicated hosting enter the conversation, covered in the open-weight cost guide.
Set a billing alert at half your budget on day one. Token spend is silent — there's no progress bar — and every provider's console buries usage a few clicks deep. The diagnostic habit is in token usage auditing.