The pricing quirk everyone misses
That single number rearranges the rankings for any output-heavy workload. The intuition for why is in the output multiplier piece, but here's the concrete version: generate a 5,000-token report from a 500-token brief, and on GPT-5.4 you pay $0.00125 input + $0.075 output — the output is 98% of the bill. The same job on Grok 4.3: $0.000625 + $0.0125. Grok ends up 6x cheaper on that task while being only 2x cheaper on input price.
The lineup: 4.3, 4.20, and Build
Grok 4.20 (same $1.25 / $2.50) is the interesting one: a 2M-token context window, the largest I track on a mainstream API, and an Arena score of 72 that puts it within three points of the frontier trio. Two million tokens is roughly 1.5 million words — a small codebase or a year of meeting transcripts in a single request.
Grok Build 0.1 ($1 / $2, 256K context) is the coding specialist. I compared it head-to-head against Claude's fast tier in the Grok Build vs Opus 4.8 Fast piece — the short version is that Build's economics shine for code generation volume, while Opus wins on the hardest debugging.
What the 2M context window really costs
The honest use case for giant windows isn't conversation; it's single-shot analysis. One request, the entire corpus, one comprehensive answer: $2.50 of input is a bargain next to building a retrieval pipeline. For iterative work, retrieval still wins, as covered in the 1M-token context piece.
Where Grok wins on real workloads
Report and email generation, code scaffolding, data-to-text pipelines — anywhere the output is several times longer than the input, Grok's effective price lands near the budget tier while its quality score sits near the mid tier.
Where it doesn't win: input-dominated workloads. RAG with fat retrieved context, document classification, summarization of long inputs — there the $1.25 input rate faces Gemini Flash-Lite at $0.25 and DeepSeek V4 Flash at $0.10, and Grok is the expensive option. Match the multiplier to your input:output ratio; the calculator does this arithmetic for you.
Caveats and the bottom line
The bottom line I'd give a friend: if your workload writes more than it reads, Grok is probably the cheapest credible option you're not considering, and Grok 4.20's quality score of 72 makes it more than a curiosity — see where it lands against Sonnet and the live pricing table for current numbers.