Why I'm writing down the plumbing
This post documents how the data actually flows, in the same spirit as the About page methodology but with the engineering specifics. If you're building anything on top of LLM pricing data — a router, a budget dashboard, an internal cost report — the failure modes below will save you some pain. And if you ever catch a wrong number here, the contact page reaches me directly; corrections ship within a day.
Layer one: the daily automated sync
The job normalizes model identifiers (more on why that's hard below), maps prices to per-million-token rates, captures context windows, and writes a generated data file that the site builds from. Nothing in this layer is hand-typed, which is the point: the failure mode of manual tables isn't the price you got wrong, it's the price that silently went stale.
On top of the daily rebuild, the calculator's live prices revalidate every hour via incremental static regeneration — each page is served as static HTML, with prices refreshed against the feed on a 60-minute cycle.
Layer two: the curated overlay
The merge rule is strict: live feed prices always win over curated copy for anything numeric. Curated text wins for anything editorial. This split keeps opinions and numbers from contaminating each other — I can be wrong in an opinion without corrupting a price, and the feed can't overwrite a judgment call.
Quality scores follow the same pattern: a daily job ingests public leaderboard standings (Arena-style ELO scores), normalizes them to the 0-100 scale explained in the quality score methodology, and attaches them to matching models. Only a minority of models have scores — currently 10 of 203 from the primary source — and the site shows nothing rather than guessing for the rest.
Two bugs I actually shipped, and the rules they bought
Bug one: dots versus dashes. Providers are inconsistent about model identifiers — claude-opus-4.8 in one system is claude-opus-4-8 in another. For a stretch, my ID normalization missed that, so a handful of models silently failed to match their live prices and fell back to older reference numbers. The fix was boring (canonicalize both forms before matching); the rule it bought was not: every unmatched model ID now gets surfaced at build time instead of failing silent.
Bug two: the popularity sort. An early version of the 'most popular' ordering let one provider's model family flood the top of the list, which made the default view quietly misleading even though every individual number was correct. The current sort round-robins across providers — documented in the popular-sort explainer — and the rule it bought: ranking logic is content, and gets reviewed like content.
Neither bug produced a wrong price on a model page, but both taught the same lesson: in a data product, the pipeline is the editorial policy.
When sources disagree, and what I don't do
What I don't do, as a matter of policy: no affiliate links, no paid placement, no commissions from any provider. Rankings move on public data or they don't move; nobody can buy a position in the comparison table. That independence is cheap to promise and expensive to fake, which is exactly why it's worth stating in writing.