Reading LLM Quality at a Glance: TokenRate's Color-Coded Badges Explained
How to read TokenRate's color-coded LLM quality badges — purple flagship (80+), sky balanced (65–79), emerald mid-tier (50–64), zinc budget (under 50) — and use them for fast model triage.
Published
Frequently Asked Questions
Where do the color thresholds (80, 65, 50) come from?
They're empirically calibrated to match how production teams segment models in practice. 80+ aligns with Arena AI Elo 1500+ (frontier); 65 aligns with Elo ~1450 (balanced production); 50 aligns with Elo ~1380 (mid-tier reliable). Below 50 is the threshold where instruction-following and structured-output reliability start to deteriorate.
Why purple, sky, emerald, zinc — what about red/yellow/green?
Quality bands aren't a red-yellow-green situation because we don't want to suggest 'bad' for any model — they all have legitimate uses. The colored palette signals tier without implying judgment: purple is regal (flagship), sky is calm (production default), emerald is workmanlike (volume), zinc is neutral (specialized use).
Does the color show up on the Compare Prices view too?
Yes — the Quality column carries the same color coding into the side-by-side comparison grid at /tools/compare-prices, so you can scan tier badges across vendors at a glance.
What if a model has no badge?
It means TokenRate's quality pipeline (Arena AI, Artificial Analysis, static fallback) doesn't have a score for that model. Pricing and context-window data still display normally. Apply the 'Rated only' Quality filter to hide unrated models from the list.
Try the TokenRate Calculator
Open the TokenRate calculator and scan the Quality column — within seconds you'll see the full landscape of flagship (purple), balanced (sky), mid-tier (emerald), and budget (zinc) models.
Open Calculator →