TokenRate

Cheapest AI Models in 2026

Ranked list of the most affordable large language models available via API, with pricing, context windows, and quality notes.

Comparing 7 models · Prices last verified

Verdict

Llama 3.1 8B is the cheapest hosted model at ~$0.05/1M input tokens. Gemini 1.5 Flash ($0.075) and Gemini 2.0 Flash ($0.10) are the cheapest with a 1M context window. For quality-per-dollar with broader ecosystem support, GPT-4o mini and Claude Haiku 4 are strong choices.

Pricing Comparison

ModelProviderInput / 1MOutput / 1MContextTier
Llama 3.1 8BMeta$0.050$0.080128Kfast
Gemini 1.5 FlashGoogle$0.075$0.3001Mfast
Gemini 2.0 FlashGoogle$0.100$0.4001Mfast
Mistral NemoMistral$0.150$0.150128Kfast
GPT-4o miniOpenAI$0.150$0.600128Kfast
Claude Haiku 4Anthropic$0.250$1.25200Kfast
Mistral SmallMistral$0.100$0.30032Kfast

Model Breakdown

$0.050

per 1M input

Llama 3.1 8B is the smallest open-weight Llama 3.1 model — extremely cheap to host or call, and good enough for classification, extraction, and basic chat.

Among the cheapest hosted LLMs
Easy to self-host (fits on a single consumer GPU)

$0.075

per 1M input

Gemini 1.5 Flash is one of the cheapest capable models available. With a 1M context window and ultra-low pricing, it's ideal for bulk document processing and cost-sensitive pipelines.

One of the cheapest models at $0.075/1M input tokens
1M context window

$0.100

per 1M input

Gemini 2.0 Flash is Google's speed-optimized model — extremely affordable with a 1M token context window. One of the best value options for high-throughput workloads.

Very cheap: $0.10/1M input tokens
Massive 1M context window even at this price

$0.150

per 1M input

Mistral Nemo is a 12B open-weight model co-developed with NVIDIA. Cheap, multilingual, and notable for symmetric input/output pricing across most hosts.

Symmetric in/out price simplifies cost modeling
128K context for a small model

$0.150

per 1M input

GPT-4o mini is OpenAI's most affordable mainstream model — a lightweight version of GPT-4o optimized for speed and cost. It's ideal for simple tasks that don't require the full capability of GPT-4o.

Very cheap: $0.15/1M input tokens
Fast and low latency

$0.250

per 1M input

Claude Haiku 4 is Anthropic's fastest and most affordable model. Designed for high-throughput tasks where cost and latency matter more than maximum capability.

Cheapest Anthropic model at $0.25/1M input tokens
Ultra-fast response times

$0.100

per 1M input

Mistral Small is an ultra-affordable model for lightweight tasks. At $0.10/1M input tokens, it competes directly with Google Flash models on price while offering solid general-purpose performance.

Very affordable at $0.10/1M input tokens
Good for simple tasks and structured outputs

Our Verdict

Llama 3.1 8B is the cheapest hosted model at ~$0.05/1M input tokens. Gemini 1.5 Flash ($0.075) and Gemini 2.0 Flash ($0.10) are the cheapest with a 1M context window. For quality-per-dollar with broader ecosystem support, GPT-4o mini and Claude Haiku 4 are strong choices.

FAQ

Compare These Models Yourself

Use the TokenRate calculator to enter your budget or token count and see the exact cost for each model side by side.

Open Calculator →

More Comparisons