Llama 3.1 405B Pricing
FlagshipMeta · 128K tokens context
Llama 3.1 405B from Meta costs $2.70 per 1 million input tokens and $2.70 per 1 million output tokens as of May 2026. The model supports a 128,000-token context window (approximately 96,000 words) with a 4K-token maximum output. A typical 1,000-token request costs $0.0027 in input charges; a 10,000-token request costs $0.0270.
| Input price | $2.70 / 1M tokens |
|---|---|
| Output price | $2.70 / 1M tokens |
| Output / input ratio | 1.0× |
| Context window | 128,000 tokens (~96,000 words) |
| Maximum output | 4,096 tokens |
| Cost per 1K tokens (input) | $0.0027 |
| Tier | Flagship |
| Last verified |
Llama 3.1 405B is Meta's largest open-weight model — competitive with GPT-4-class models on many benchmarks and uniquely available for self-hosting. Symmetric input/output pricing is common across hosted providers.
Input Price
$2.70
per 1 million tokens
Output Price
$2.70
per 1 million tokens
Context Window
128K tokens
max 4K output
Cost Examples
| Request Type | Tokens | Input Cost | Output Cost |
|---|---|---|---|
| 1,000 word article | 1,333 | $0.0036 | $0.00108 |
| 10-page document (2,500 words) | 3,333 | $0.009 | $0.0027 |
| 1,000 lines of code | 5,000 | $0.0135 | $0.00405 |
| 100K token document | 100,000 | $0.27 | $0.081 |
Output cost estimated at 30% of input token count. Use the calculator for exact figures.
Strengths
- ✓Open-weight: can be self-hosted
- ✓Frontier-class quality on many tasks
- ✓Symmetric in/out pricing simplifies cost modeling
Limitations
- –Slower than smaller Llama variants
- –No native multimodal
- –128K context (matches GPT-4o but trails Gemini)
Best Use Cases
Calculate Llama 3.1 405B Costs
Use the TokenRate calculator to convert any budget, token count, or text into exact Llama 3.1 405B costs — and compare across all models.
Open Calculator →Llama 3.1 405B — FAQ
Related Models
Related Guides