Prompt Caching
A feature that reuses previously processed prompt content across API calls, reducing latency and cost when the same system prompt or context is sent repeatedly.
Prompt caching is an API-level optimisation that stores processed prompt prefixes so they do not need to be recomputed on every request. When you send the same system prompt, instructions, or context across multiple API calls, the cached portion is processed once and reused — reducing both cost and response time.
How it works
Every time you call an AI API, the model processes your entire prompt from scratch — system prompt, conversation history, context documents, and your new message. For applications that use the same system prompt across thousands of requests, this is wasteful.
With prompt caching, the provider stores the processed representation of the static portion (typically the system prompt and any fixed context). Subsequent requests that share the same prefix skip the processing of the cached portion, paying only for the new content.
Cost impact
Cached input tokens typically cost 80-90% less than uncached input tokens. For applications with long system prompts (common in agent deployments), this can reduce total API costs by 30-50%.
Latency impact
Skipping the processing of cached tokens reduces time to first token. For prompts with 10,000+ tokens of cached context, this can shave 1-3 seconds off response time.
When it matters
Prompt caching is relevant primarily for developers building AI-powered applications via API. If you are using ChatGPT or Claude through their web interfaces, caching is handled transparently. It becomes important when you are: running AI agents with long system prompts, building applications that serve many users with the same base instructions, or processing batch workflows where the same context is reused.
Why This Matters
For organisations deploying AI at scale — hundreds or thousands of API calls per day — prompt caching is the single most impactful cost optimisation. Understanding when and how to leverage it can reduce AI infrastructure costs by 30-50%, which directly affects the ROI calculations that determine whether AI projects get continued funding.
Related Terms
Continue learning in Expert
This topic is covered in our lesson: Token Management: Controlling Your AI Spend