NVIDIA API Pricing

NVIDIA publishes the open Nemotron family — models post-trained for reasoning, agentic workflows, and tool use, including Llama-Nemotron variants. They're designed to run efficiently on NVIDIA hardware and are freely available for self-hosting.

Official site: build.nvidia.com →

Cheapest

Nemotron 3 Nano 30B A3B

$0.050/1M in

Flagship

Nemotron 3 Ultra

$0.500/1M in

Models

4 tracked

All tiers, latest pricing.

All NVIDIA Models

Model	Tier	Input / 1M	Output / 1M	Context
Nemotron 3 Nano 30B A3B	fast	$0.050	$0.200	262K
Nemotron 3 Super	fast	$0.085	$0.400	1M
Llama 3.3 Nemotron Super 49B V1.5	balanced	$0.400	$0.400	131K
Nemotron 3 Ultra	flagship	$0.500	$2.20	1M

Model Details

Nemotron 3 Nano 30B A3B

$0.050 in

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems.

Nemotron 3 Super

$0.085 in

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications.

Llama 3.3 Nemotron Super 49B V1.5

$0.400 in

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context.

Nemotron 3 Ultra

$0.500 in

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE).

Calculate NVIDIA API Costs

Use the TokenRate calculator to estimate exactly what NVIDIA models will cost for your workload.

Open Calculator →

Other Providers