Last updated: April 5, 2026 · Pricing & Deployment · by Daniel Ashford
What is Prompt Caching?
A feature that reduces costs up to 90% by reusing previously processed system prompts.
Definition
Prompt caching stores the processed representation of static prompt components so they do not need to be recomputed on every request. This can reduce input token costs by up to 90%.
How It Works
The provider saves the computed state of your static prompt prefix. Subsequent requests with the same prefix reuse the cached computation. The cache has a TTL of 5-60 minutes. Even small changes to the static portion break the cache.
Example
Your chatbot sends the same 2,000-token system prompt every message. With caching, subsequent requests charge only 10% for those tokens.
Related Terms
See How Models Compare
Understanding prompt caching is important when choosing the right AI model. See how 12 models compare on our leaderboard.