Table 1 — Input Token Pricing (per 1M tokens, USD)
| Provider | Model | Input Price |
|---|---|---|
| OpenAI | GPT-4o | $2.50 |
| OpenAI | GPT-4o-mini | $0.15 |
| Anthropic | Claude Sonnet 4 | $3.00 |
| Anthropic | Claude Haiku 3.5 | $0.80 |
| Gemini 2.5 Pro | $1.25 | |
| Gemini 2.5 Flash | $0.15 | |
| Token Landing | Hybrid (blended) | ~$0.80–1.50 |
Table 2 — Output Token Pricing (per 1M tokens, USD)
| Provider | Model | Output Price |
|---|---|---|
| OpenAI | GPT-4o | $10.00 |
| OpenAI | GPT-4o-mini | $0.60 |
| Anthropic | Claude Sonnet 4 | $15.00 |
| Anthropic | Claude Haiku 3.5 | $4.00 |
| Gemini 2.5 Pro | $10.00 | |
| Gemini 2.5 Flash | $0.60 | |
| Token Landing | Hybrid (blended) | ~$3.00–6.00 |
Table 3 — Monthly Cost Estimate (1M requests/month)
Assumes an average of 500 input tokens and 1,500 output tokens per request.
| Approach | Monthly Cost | Quality |
|---|---|---|
| All GPT-4o | ~$16,250 | Highest |
| All GPT-4o-mini | ~$975 | Good |
| All Claude Sonnet | ~$24,000 | Highest |
| Helix Hybrid | ~$5,000–9,500 | High (A-tier on critical paths) |
Prices are approximate as of early 2026 and may change without notice. Always verify with each provider's official pricing page before committing to a budget.
Why output tokens dominate your bill
Look at the tables above: output tokens cost 3–5x more than input tokens across every provider. The reason is computational. Input tokens are processed in parallel during a single forward pass, while output tokens require autoregressive generation — the model produces one token at a time, maintaining full attention state at each step.
For most conversational or agentic workloads, output tokens outnumber input tokens 2:1 to 4:1. That means output pricing is responsible for 75–90% of your total API spend. If you want to cut costs, start by reducing output token volume — shorter system prompts that guide concise replies, structured output formats, and caching strategies all help. See input vs output tokens for a deep dive.
The case for hybrid routing
Running every request through a frontier model like Claude Sonnet 4 or GPT-4o delivers top quality — but the monthly bill adds up fast, as Table 3 shows. Conversely, using only a mini/flash model saves money but sacrifices quality on the requests that matter most (first user-facing replies, tool calls, error recoveries).
Hybrid routing splits the difference. A policy layer classifies each request and routes it to the appropriate tier: A-tier models for high-stakes turns, value-tier models for bulk and repetition-safe work. The result is 40–70% lower spend compared to an all-premium stack, with near-identical perceived quality. For architecture details, see hybrid AI tokens and OpenAI-compatible API.
How to estimate your spend
Use this formula:
Monthly cost = requests/month x [(avg input tokens x input price) + (avg output tokens x output price)]
For example, 1M requests at 500 input + 1,500 output tokens on GPT-4o:
1,000,000 x [(500 x $2.50 / 1,000,000) + (1,500 x $10.00 / 1,000,000)] = 1,000,000 x [$0.00125 + $0.015] = $16,250/month
Swap in the hybrid blended rates from the tables above and the same workload drops to $5,000–9,500/month. For a step-by-step walkthrough, see the AI token pricing guide.
Disclaimer: Pricing data is gathered from public provider documentation and may not reflect negotiated enterprise rates, volume discounts, or regional variations. Token Landing hybrid pricing depends on your specific tier mix and routing configuration. This page is for informational purposes and does not constitute a contractual price guarantee.