What is an OpenAI-compatible API?
An OpenAI-compatible API accepts the same request format as OpenAI's /v1/chat/completions endpoint. You send the same JSON body, get back the same response shape. The only thing that changes is the base URL.
Token Landing implements this compatibility layer with one difference: behind that familiar interface, your requests get routed between premium and economy models based on rules you define. The OpenAI SDK, LangChain, LlamaIndex, and any tool that speaks the OpenAI protocol works without modification.
Why we went this route (no pun intended)
We spent weeks debating whether to build a custom API or go OpenAI-compatible. The custom route would have let us do fancier things with the request format. But after talking to about a dozen teams, the answer was obvious: nobody wants to rewrite their API client. One infrastructure lead told us, "I have 47 places in our codebase that call OpenAI. I'm not touching all of them."
So we went with compatibility. Migration is a one-line config change:
# Before
client = OpenAI(api_key="sk-...")
# After
client = OpenAI(
api_key="your-token-landing-key",
base_url="https://api.token-landing.com/v1"
)
That's it. Your prompts, your system messages, your function calling, your streaming logic, your retry handling, your error types. All the same.
What actually changes under the hood
When a request comes in, our routing layer looks at a few signals to decide where it goes:
| Signal | A-tier (premium) | Value-tier (economy) |
|---|---|---|
| First message in conversation | Yes, users notice first impressions | No |
| Tool/function calls | Yes, failures are visible | No |
| System prompt only (warmup) | No | Yes, nobody sees this |
| Summarization/extraction | No | Yes, output quality is sufficient |
| Embedding generation | No | Yes, dedicated embedding models |
You configure these rules through a routing policy. The defaults work well for most chat-based products, but if you're running a code generation pipeline or a RAG system, you'll want to customize which requests hit A-tier.
What about streaming?
Streaming works exactly like OpenAI's implementation. Set stream: true and you get back Server-Sent Events in the same format. We match the delta structure, the finish_reason signaling, and the usage reporting at the end of the stream.
One thing we actually improved: our streaming has slightly lower time-to-first-token than going direct to some providers, because our edge routing can start sending from the fastest-responding model while the routing decision is still being finalized for the next chunk. It's a small optimization but it matters for chat UX.
Function calling and tool use
Fully supported. The tools and tool_choice parameters work the same way. We route tool-calling requests to A-tier by default because tool failures tend to cascade visibly in user-facing applications.
If you're doing batch tool calls (like extracting structured data from documents), you can override this to value-tier and save 60-70% on those requests.
Models available through the API
| Tier | Models | Approximate cost (per 1M output tokens) |
|---|---|---|
| A-tier (premium) | Claude Sonnet 4.6, GPT-5.4, Gemini 2 Pro | $10-15 |
| Value-tier | Claude Haiku, GPT-5 Nano, Gemini Flash | $0.60-4.00 |
| Hybrid (auto-routed) | Mix of above based on routing policy | $2-6 (blended) |
You don't specify models directly. You specify quality intent ("premium", "economy", or "auto") and our router picks the best available model. This means you automatically benefit when we add new models or when providers change pricing.
Migration checklist
- Get API credentials from Token Landing (fill out the contact form)
- Swap the base URL in your OpenAI client configuration
- Set your routing policy (we help with this during onboarding)
- Test with a shadow deployment to compare quality and cost
- Cut over production traffic once you're satisfied
Most teams complete this in under a day. The longest part is usually deciding on routing rules, not the technical migration.
What we don't support (yet)
To be transparent: we don't currently support image generation (DALL-E endpoints), fine-tuning, or the Assistants API. Our focus is on /v1/chat/completions and /v1/embeddings, which covers the vast majority of production use cases. Assistants API support is on our roadmap for Q3 2026.