TokenLanding

OpenAI-compatible API: same SDK, hybrid token economics

Token Landing's OpenAI-compatible API lets you migrate with a base-URL swap. Same SDK, same request format, but with hybrid routing that cuts costs 40-70%.

APIOpenAIMigrationUpdated: 2026-04-12

TL;DR

Token Landing is a drop-in OpenAI-compatible API. Change one line of config (the base URL) and your existing code works with hybrid A-tier/value-tier routing, cutting token costs 40-70%.

What is an OpenAI-compatible API?

An OpenAI-compatible API accepts the same request format as OpenAI's /v1/chat/completions endpoint. You send the same JSON body, get back the same response shape. The only thing that changes is the base URL.

Token Landing implements this compatibility layer with one difference: behind that familiar interface, your requests get routed between premium and economy models based on rules you define. The OpenAI SDK, LangChain, LlamaIndex, and any tool that speaks the OpenAI protocol works without modification.

Why we went this route (no pun intended)

We spent weeks debating whether to build a custom API or go OpenAI-compatible. The custom route would have let us do fancier things with the request format. But after talking to about a dozen teams, the answer was obvious: nobody wants to rewrite their API client. One infrastructure lead told us, "I have 47 places in our codebase that call OpenAI. I'm not touching all of them."

So we went with compatibility. Migration is a one-line config change:

# Before
client = OpenAI(api_key="sk-...")

# After
client = OpenAI(
    api_key="your-token-landing-key",
    base_url="https://api.token-landing.com/v1"
)

That's it. Your prompts, your system messages, your function calling, your streaming logic, your retry handling, your error types. All the same.

What actually changes under the hood

When a request comes in, our routing layer looks at a few signals to decide where it goes:

SignalA-tier (premium)Value-tier (economy)
First message in conversationYes, users notice first impressionsNo
Tool/function callsYes, failures are visibleNo
System prompt only (warmup)NoYes, nobody sees this
Summarization/extractionNoYes, output quality is sufficient
Embedding generationNoYes, dedicated embedding models

You configure these rules through a routing policy. The defaults work well for most chat-based products, but if you're running a code generation pipeline or a RAG system, you'll want to customize which requests hit A-tier.

What about streaming?

Streaming works exactly like OpenAI's implementation. Set stream: true and you get back Server-Sent Events in the same format. We match the delta structure, the finish_reason signaling, and the usage reporting at the end of the stream.

One thing we actually improved: our streaming has slightly lower time-to-first-token than going direct to some providers, because our edge routing can start sending from the fastest-responding model while the routing decision is still being finalized for the next chunk. It's a small optimization but it matters for chat UX.

Function calling and tool use

Fully supported. The tools and tool_choice parameters work the same way. We route tool-calling requests to A-tier by default because tool failures tend to cascade visibly in user-facing applications.

If you're doing batch tool calls (like extracting structured data from documents), you can override this to value-tier and save 60-70% on those requests.

Models available through the API

TierModelsApproximate cost (per 1M output tokens)
A-tier (premium)Claude Sonnet 4.6, GPT-5.4, Gemini 2 Pro$10-15
Value-tierClaude Haiku, GPT-5 Nano, Gemini Flash$0.60-4.00
Hybrid (auto-routed)Mix of above based on routing policy$2-6 (blended)

You don't specify models directly. You specify quality intent ("premium", "economy", or "auto") and our router picks the best available model. This means you automatically benefit when we add new models or when providers change pricing.

Migration checklist

  1. Get API credentials from Token Landing (fill out the contact form)
  2. Swap the base URL in your OpenAI client configuration
  3. Set your routing policy (we help with this during onboarding)
  4. Test with a shadow deployment to compare quality and cost
  5. Cut over production traffic once you're satisfied

Most teams complete this in under a day. The longest part is usually deciding on routing rules, not the technical migration.

What we don't support (yet)

To be transparent: we don't currently support image generation (DALL-E endpoints), fine-tuning, or the Assistants API. Our focus is on /v1/chat/completions and /v1/embeddings, which covers the vast majority of production use cases. Assistants API support is on our roadmap for Q3 2026.

FAQ

+What does OpenAI-compatible API mean?
It means the API accepts the same request and response format as OpenAI. You can migrate by changing the base URL in your OpenAI SDK client. No code rewrites, no prompt changes, no new dependencies.
+Can I use the OpenAI Python/Node SDK with Token Landing?
Yes. Set base_url to https://api.token-landing.com/v1 and use your Token Landing API key. Everything else stays the same, including streaming, function calling, and error handling.
+Does streaming work the same way?
Yes. Server-Sent Events, delta structure, finish_reason, and usage reporting all match the OpenAI format. We actually have slightly lower time-to-first-token in some cases due to edge routing optimization.
+What models can I access through the API?
A-tier includes Claude Sonnet 4.6, GPT-5.4, and Gemini 2 Pro. Value-tier includes Haiku, GPT-5 Nano, and Gemini Flash. You specify quality intent, not model names. The router picks the best option.
+How long does migration take?
The technical migration is a one-line config change. Most teams spend a day total including setting up routing rules and running a shadow deployment to validate quality.

Ready to cut your token bill?

Token Landing — hybrid AI tokens, Claude-class UX, saner spend

Related reading

All guides