Definitions
A-tier (premium-path) tokens cover turns users experience as "the product": first replies, tool calls, recoveries after errors, and high-stakes reasoning beats.
Value-tier (bulk) tokens cover repetition-safe work: long context compaction, drafts of boilerplate, embedding-heavy pre-processing, and high-frequency autocomplete-style loops.
Cost impact: A-tier vs value-tier
Hybrid routing typically reduces total token spend by 40–70% compared to all-flagship pricing—a core principle of LLM cost optimization. Here is how the two tiers compare in practice:
| Dimension | A-tier (premium-path) | Value-tier (bulk) |
|---|---|---|
| Use case | First replies, tool calls, error recovery, high-stakes reasoning | Context compaction, boilerplate, embeddings, autocomplete loops |
| Model class | Frontier / flagship (e.g. Claude Sonnet, GPT-4o) | Efficient / small (e.g. Haiku, GPT-4o-mini, Gemini Flash) |
| Typical cost per 1M output tokens | $10–15 | $0.60–4.00 |
| % of total requests (typical product) | 20–35% | 65–80% |
| User-perceived quality impact | High — defines UX | Low — users don't notice the difference |
Because 65–80% of tokens go through value-tier routing, the blended cost drops significantly while the user-facing experience stays premium. See concrete pricing numbers across providers.
Relationship to OpenAI-shaped APIs
Token Landing is a drop-in replacement for OpenAI-style endpoints. You keep familiar request shapes; multi-model routing and spend policies sit in front of provider calls. Migration requires a base-URL swap — no SDK changes, no prompt rewrites. See OpenAI compatibility for migration notes.
Claude-class surfaces without all-Claude bills
Product surfaces can still feel Claude-class when the routing policy reserves premium-path tokens for the moments that define UX quality. According to internal testing, users rate hybrid-routed responses within 3% of all-flagship responses on satisfaction scores, while total cost drops by 40–70%.