NVIDIA API Pricing
NVIDIA publishes the open Nemotron family — models post-trained for reasoning, agentic workflows, and tool use, including Llama-Nemotron variants. They're designed to run efficiently on NVIDIA hardware and are freely available for self-hosting.
Models
4 tracked
All tiers, latest pricing.
All NVIDIA Models
| Model | Tier | Input / 1M | Output / 1M | Context |
|---|---|---|---|---|
| Nemotron 3 Nano 30B A3B | fast | $0.050 | $0.200 | 262K |
| Nemotron 3 Super | fast | $0.085 | $0.400 | 1M |
| Llama 3.3 Nemotron Super 49B V1.5 | balanced | $0.400 | $0.400 | 131K |
| Nemotron 3 Ultra | flagship | $0.500 | $2.20 | 1M |
Model Details
Nemotron 3 Nano 30B A3B
$0.050 inNVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems.
Nemotron 3 Super
$0.085 inNVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications.
Llama 3.3 Nemotron Super 49B V1.5
$0.400 inLlama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context.
Nemotron 3 Ultra
$0.500 inNVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE).
Calculate NVIDIA API Costs
Use the TokenRate calculator to estimate exactly what NVIDIA models will cost for your workload.
Open Calculator →Other Providers