Prompt Caching: Save 50-90% on LLM API Costs

How Prompt Caching Works

Every LLM API request includes input tokens that the model processes before generating output. Many applications send the same system prompt, few-shot examples, or reference documents with every request. Without caching, you pay the full input token price every time.

Prompt caching tells the provider to store the processed state of your prompt prefix. Subsequent requests that share the same prefix reuse the cached version, reducing both cost and processing time.

Provider Caching Comparison

Provider	Cache Discount	Min Cache Size	Cache Duration	Implementation
Anthropic	90% off input	1,024 tokens	5 min (auto-extend on use)	Explicit cache_control blocks
OpenAI	50% off input	1,024 tokens	Automatic	Automatic (no code changes)
Google	~75% off input	32,768 tokens	Configurable	CachedContent API

Discounts and limits approximate. Last updated April 2026.

Real-World Savings Examples

Consider an application that sends a 4,000-token system prompt with every request, processing 100,000 requests per month with an additional 1,000 variable tokens per request:

Scenario	Input Cost (Claude Sonnet 4.6)	Savings
No caching (500M tokens)	$1,500	—
With caching (400M cached + 100M fresh)	$420	72%

The 400M cached tokens cost only $0.30/1M instead of $3.00/1M (Anthropic's 90% discount), while the 100M variable tokens pay full price. Total savings: $1,080 per month from this single optimization.

Best Practices for Prompt Caching

Put cacheable content first: System prompts, instructions, and reference documents should be at the beginning of your prompt since caching works on prefixes.
Maximize the cached prefix: The more tokens you can cache, the bigger your savings. If you reference the same documents repeatedly, include them in the cached prefix.
Keep cache warm: Most providers expire caches after minutes of inactivity. Ensure your request patterns keep the cache alive, or schedule periodic warm-up requests.
Monitor cache hit rates: Track what percentage of your input tokens are served from cache. Aim for 60%+ cache hit rates for meaningful savings.

Combining Caching with Hybrid Routing

Prompt caching and hybrid routing are complementary optimizations. Token Landing applies caching automatically where supported, and routes requests to the most cost-effective model for each task. The combined effect can reduce your effective per-token costs by 70-90% compared to uncached, single-model usage.

For workloads with repetitive prompts and mixed task complexity, this combination delivers the most significant savings available in the current LLM API market.

FAQ

+What is prompt caching for LLM APIs?

Prompt caching stores the processed version of a prompt prefix (like system prompts or large documents) so subsequent requests that share that prefix do not need to reprocess it. This reduces both cost and latency.

+How much does prompt caching save?

Savings range from 50-90% on cached tokens depending on the provider. Anthropic offers up to 90% savings, OpenAI offers 50%, and Google offers similar discounts. Total savings depend on what percentage of your tokens are cacheable.

+Does prompt caching affect output quality?

No. Prompt caching only affects how input tokens are processed and priced. The model produces identical outputs whether the prompt is cached or not.

Prompt Caching: Save 50-90% on LLM API Costs

How Prompt Caching Works

Provider Caching Comparison

Real-World Savings Examples

Best Practices for Prompt Caching

Combining Caching with Hybrid Routing

FAQ

Ready to cut your token bill?

Related reading

All guides