TokenLanding

Prompt Caching: Save 50-90% on LLM API Costs

Learn how prompt caching works across OpenAI, Anthropic, and Google APIs. Reduce LLM costs by 50-90% on cached tokens with practical implementation strategies.

Updated: 2026-04-06

TL;DR

Prompt caching stores frequently used prompt prefixes so you only pay full price once. Anthropic offers up to 90% savings on cached tokens, OpenAI provides automatic caching with 50% savings, and Google offers similar discounts. If you send the same system prompt repeatedly, you are likely overpaying.

How Prompt Caching Works

Every LLM API request includes input tokens that the model processes before generating output. Many applications send the same system prompt, few-shot examples, or reference documents with every request. Without caching, you pay the full input token price every time.

Prompt caching tells the provider to store the processed state of your prompt prefix. Subsequent requests that share the same prefix reuse the cached version, reducing both cost and processing time.

Provider Caching Comparison

ProviderCache DiscountMin Cache SizeCache DurationImplementation
Anthropic90% off input1,024 tokens5 min (auto-extend on use)Explicit cache_control blocks
OpenAI50% off input1,024 tokensAutomaticAutomatic (no code changes)
Google~75% off input32,768 tokensConfigurableCachedContent API

Discounts and limits approximate. Last updated April 2026.

Real-World Savings Examples

Consider an application that sends a 4,000-token system prompt with every request, processing 100,000 requests per month with an additional 1,000 variable tokens per request:

ScenarioInput Cost (Claude Sonnet 4)Savings
No caching (500M tokens)$1,500
With caching (400M cached + 100M fresh)$42072%

The 400M cached tokens cost only $0.30/1M instead of $3.00/1M (Anthropic's 90% discount), while the 100M variable tokens pay full price. Total savings: $1,080 per month from this single optimization.

Best Practices for Prompt Caching

  • Put cacheable content first: System prompts, instructions, and reference documents should be at the beginning of your prompt since caching works on prefixes.
  • Maximize the cached prefix: The more tokens you can cache, the bigger your savings. If you reference the same documents repeatedly, include them in the cached prefix.
  • Keep cache warm: Most providers expire caches after minutes of inactivity. Ensure your request patterns keep the cache alive, or schedule periodic warm-up requests.
  • Monitor cache hit rates: Track what percentage of your input tokens are served from cache. Aim for 60%+ cache hit rates for meaningful savings.

Combining Caching with Hybrid Routing

Prompt caching and hybrid routing are complementary optimizations. Token Landing applies caching automatically where supported, and routes requests to the most cost-effective model for each task. The combined effect can reduce your effective per-token costs by 70-90% compared to uncached, single-model usage.

For workloads with repetitive prompts and mixed task complexity, this combination delivers the most significant savings available in the current LLM API market.

FAQ

+What is prompt caching for LLM APIs?
Prompt caching stores the processed version of a prompt prefix (like system prompts or large documents) so subsequent requests that share that prefix do not need to reprocess it. This reduces both cost and latency.
+How much does prompt caching save?
Savings range from 50-90% on cached tokens depending on the provider. Anthropic offers up to 90% savings, OpenAI offers 50%, and Google offers similar discounts. Total savings depend on what percentage of your tokens are cacheable.
+Does prompt caching affect output quality?
No. Prompt caching only affects how input tokens are processed and priced. The model produces identical outputs whether the prompt is cached or not.

Ready to cut your token bill?

Token Landing — hybrid AI tokens, Claude-class UX, saner spend

Related reading