DeepSeek API Pricing in 2026: The 100x Cheaper Question

Model	Input / 1M	Output / 1M	Context	1M in + 1M out
DeepSeek V4 Flash	$0.10	$0.20	1M	$0.30
DeepSeek V4 Pro	$0.44	$0.87	1M	$1.31
DeepSeek V3.2	$0.23	$0.34	131K	$0.57
DeepSeek R1	$0.70	$2.50	164K	$3.20

Model

Input / 1M

Output / 1M

Context

1M in + 1M out

DeepSeek V4 Flash

$0.10

$0.20

$0.30

DeepSeek V4 Pro

$0.44

$0.87

$1.31

DeepSeek V3.2

$0.23

$0.34

131K

$0.57

DeepSeek R1

$0.70

$2.50

164K

$3.20

The numbers, because they sound made up

DeepSeek V4 Flash bills about $0.10 per million input tokens and $0.20 per million output. For calibration: a typical chat message (1,500 in, 400 out) costs $0.00023 — a fortieth of a cent. A million such messages: about $230. The same million messages on GPT-5.5: about $19,500.

When the gap is 50-100x, the usual model-selection framework inverts. The question stops being 'is the cheap model good enough to save money' and becomes 'is the expensive model better enough to justify 100x.' For a real slice of workloads — drafting, extraction, summarization, internal tools — I've found the honest answer is no.

The lineup: Flash, Pro, and R1

V4 Flash ($0.10 / $0.20, 1M context) is the volume model and the headline act. V4 Pro ($0.44 / $0.87, 1M context) is the step up for harder reasoning and code — still cheaper than every Western budget tier, including GPT-5.4-nano on output.

R1 ($0.70 / $2.50) is the reasoning specialist, the lineage that made DeepSeek famous in early 2025. Like all reasoning models it emits thinking tokens billed as output, so real costs per task run higher than the sticker suggests — the same dynamic as extended thinking on Claude, at a tiny fraction of the rate. My earlier deep-dive on R1 as a bargain reasoning model still holds; for budget reasoning overall, see the best budget reasoning picks.

Older V3.x models remain listed at sub-dollar prices but V4 Flash has mostly obsoleted them on both price and quality.

Why so cheap — and is it sustainable?

Three structural reasons, none of them tricks. DeepSeek publishes open weights, so it competes with anyone who can host its own models — that caps what it can charge. Its training and inference stack is famously efficient; the V3 technical report documented training costs an order of magnitude below Western peers. And its strategic priority has visibly been adoption, not margin.

Sustainability is the fair question. My take from watching the feed daily: prices have been stable-to-falling for eighteen months, and because the weights are open, even a unilateral price hike wouldn't strand you — third-party hosts serve the same models at similar rates. The floor is set by competition among hosts, not by one company's pricing committee. That's a structural protection closed models can't offer, a theme I expand in the open-weight vs proprietary cost guide.

The caveats that actually matter

Quality first: V4 Pro and Flash benchmark respectably against Western budget and mid tiers, but the frontier trio (Gemini 3.1 Pro, GPT-5.5, Opus 4.8) remains clearly stronger on hard reasoning, long-horizon agentic work, and nuanced instruction following. DeepSeek competes on value, not ceiling.

Operations second: the first-party API enforces tighter rate limits than US hyperscalers, and I've observed latency variance during Asia-Pacific peak hours that would bother an interactive product. Batch and overnight pipelines won't notice.

Jurisdiction third, stated plainly: DeepSeek is a Chinese company and the first-party API processes data under Chinese jurisdiction. For some compliance regimes that's disqualifying. The standard mitigation is using the open weights through a Western host (via OpenRouter, Together, Fireworks, and similar) at modestly higher prices — you keep most of the discount and move the data question to a provider you can contract with. I priced that route in the OpenRouter vs direct API comparison.

Where DeepSeek fits in a real stack

The pattern I see working is tiered routing: DeepSeek V4 Flash as the default lane for high-volume, low-stakes calls; a Western mid-tier (Gemini 3.5 Flash, GPT-5.4-mini, Haiku 4.5) for customer-visible output; a frontier model reserved for the hard 5% of requests. Done with even crude confidence-based escalation, this routinely cuts total spend 60-80% versus running everything on the mid tier — the architecture from the multi-model routing piece.

Concrete example I priced this week: a log-triage pipeline classifying 5M events a month (400 tokens in, 30 out each). V4 Flash: about $230/month. GPT-5.4-nano: $588. Gemini Flash-Lite: $725. Haiku 4.5: $2,750. Same task, same month, an order of magnitude of spread — check your own numbers on the calculator or the live table.

Frequently Asked Questions

How much does the DeepSeek API cost?

As of June 2026: DeepSeek V4 Flash costs about $0.10 per million input tokens and $0.20 per million output; V4 Pro about $0.44 / $0.87; the R1 reasoning model $0.70 / $2.50. A typical chat message on V4 Flash costs a few hundredths of a cent.

Is DeepSeek really 100x cheaper than GPT or Claude?

Against frontier tiers, roughly yes: processing 1M input + 1M output tokens costs about $0.30 on V4 Flash versus $35 on GPT-5.5 and $30 on Claude Opus 4.8 — a 100x-class gap. Against Western budget tiers (GPT-5.4-nano, Gemini Flash-Lite) the gap narrows to 2-7x.

Is DeepSeek safe to use for company data?

The first-party API processes data under Chinese jurisdiction, which some compliance regimes rule out. Because DeepSeek publishes open weights, the common workaround is running the same models through a Western host (OpenRouter, Together, Fireworks) at slightly higher prices, keeping most of the cost advantage.

Is DeepSeek quality comparable to frontier models?

Not at the top end — Gemini 3.1 Pro, GPT-5.5, and Claude Opus 4.8 are clearly stronger on hard reasoning and agentic tasks. But V4 competes credibly with Western budget and mid tiers, which is the relevant comparison at its price point.