Cheapest AI Models in 2026
Ranked list of the most affordable large language models available via API, with pricing, context windows, and quality notes.
Comparing 7 models · Prices last verified
Verdict
Llama 3.1 8B is the cheapest hosted model at ~$0.05/1M input tokens. Gemini 1.5 Flash ($0.075) and Gemini 2.0 Flash ($0.10) are the cheapest with a 1M context window. For quality-per-dollar with broader ecosystem support, GPT-4o mini and Claude Haiku 4 are strong choices.
Pricing Comparison
| Model | Provider | Input / 1M | Output / 1M | Context | Tier |
|---|---|---|---|---|---|
| Llama 3.1 8B | Meta | $0.050 | $0.080 | 128K | fast |
| Gemini 1.5 Flash | $0.075 | $0.300 | 1M | fast | |
| Gemini 2.0 Flash | $0.100 | $0.400 | 1M | fast | |
| Mistral Nemo | Mistral | $0.150 | $0.150 | 128K | fast |
| GPT-4o mini | OpenAI | $0.150 | $0.600 | 128K | fast |
| Claude Haiku 4 | Anthropic | $0.250 | $1.25 | 200K | fast |
| Mistral Small | Mistral | $0.100 | $0.300 | 32K | fast |
Model Breakdown
Meta
$0.050
per 1M input
Llama 3.1 8B is the smallest open-weight Llama 3.1 model — extremely cheap to host or call, and good enough for classification, extraction, and basic chat.
$0.075
per 1M input
Gemini 1.5 Flash is one of the cheapest capable models available. With a 1M context window and ultra-low pricing, it's ideal for bulk document processing and cost-sensitive pipelines.
$0.100
per 1M input
Gemini 2.0 Flash is Google's speed-optimized model — extremely affordable with a 1M token context window. One of the best value options for high-throughput workloads.
Mistral
$0.150
per 1M input
Mistral Nemo is a 12B open-weight model co-developed with NVIDIA. Cheap, multilingual, and notable for symmetric input/output pricing across most hosts.
OpenAI
$0.150
per 1M input
GPT-4o mini is OpenAI's most affordable mainstream model — a lightweight version of GPT-4o optimized for speed and cost. It's ideal for simple tasks that don't require the full capability of GPT-4o.
Anthropic
$0.250
per 1M input
Claude Haiku 4 is Anthropic's fastest and most affordable model. Designed for high-throughput tasks where cost and latency matter more than maximum capability.
Mistral
$0.100
per 1M input
Mistral Small is an ultra-affordable model for lightweight tasks. At $0.10/1M input tokens, it competes directly with Google Flash models on price while offering solid general-purpose performance.
Our Verdict
Llama 3.1 8B is the cheapest hosted model at ~$0.05/1M input tokens. Gemini 1.5 Flash ($0.075) and Gemini 2.0 Flash ($0.10) are the cheapest with a 1M context window. For quality-per-dollar with broader ecosystem support, GPT-4o mini and Claude Haiku 4 are strong choices.
FAQ
Compare These Models Yourself
Use the TokenRate calculator to enter your budget or token count and see the exact cost for each model side by side.
Open Calculator →More Comparisons