{"id":634,"date":"2026-03-21T22:52:57","date_gmt":"2026-03-21T14:52:57","guid":{"rendered":"https:\/\/pa.yingzhi8.cn\/index.php\/2026\/03\/21\/reference-prompt-caching\/"},"modified":"2026-03-21T23:08:54","modified_gmt":"2026-03-21T15:08:54","slug":"reference-prompt-caching","status":"publish","type":"post","link":"https:\/\/pa.yingzhi8.cn\/index.php\/2026\/03\/21\/reference-prompt-caching\/","title":{"rendered":"Prompt Caching"},"content":{"rendered":"<h1>Prompt Caching<\/h1>\n<h1>Prompt caching<\/h1>\n<p>Prompt caching means the model provider can reuse unchanged prompt prefixes (usually system\/developer instructions and other stable context) across turns instead of re-processing them every time. The first matching request writes cache tokens (<code>cacheWrite<\/code>), and later matching requests can read them back (<code>cacheRead<\/code>).<\/p>\n<p>Why this matters: lower token cost, faster responses, and more predictable performance for long-running sessions. Without caching, repeated prompts pay the full prompt cost on every turn even when most input did not change.<\/p>\n<p>This page covers all cache-related knobs that affect prompt reuse and token cost.<\/p>\n<p>For Anthropic pricing details, see:<br \/>\n<a href=\"https:\/\/docs.anthropic.com\/docs\/build-with-claude\/prompt-caching\">https:\/\/docs.anthropic.com\/docs\/build-with-claude\/prompt-caching<\/a><\/p>\n<h2>Primary knobs<\/h2>\n<h3><code>cacheRetention<\/code> (model and per-agent)<\/h3>\n<p>Set cache retention on model params:<\/p>\n<p>&#8220;`yaml  theme={&#8220;theme&#8221;:{&#8220;light&#8221;:&#8221;min-light&#8221;,&#8221;dark&#8221;:&#8221;min-dark&#8221;}}<br \/>\nagents:<br \/>\n  defaults:<br \/>\n    models:<br \/>\n      &#8220;anthropic\/claude-opus-4-6&#8221;:<br \/>\n        params:<br \/>\n          cacheRetention: &#8220;short&#8221; # none | short | long<\/p>\n<pre><code>\nPer-agent override:\n\n```yaml  theme={&quot;theme&quot;:{&quot;light&quot;:&quot;min-light&quot;,&quot;dark&quot;:&quot;min-dark&quot;}}\nagents:\n  list:\n    - id: &quot;alerts&quot;\n      params:\n        cacheRetention: &quot;none&quot;\n<\/code><\/pre>\n<p>Config merge order:<\/p>\n<ol>\n<li><code>agents.defaults.models[\"provider\/model\"].params<\/code><\/li>\n<li><code>agents.list[].params<\/code> (matching agent id; overrides by key)<\/li>\n<\/ol>\n<h3>Legacy <code>cacheControlTtl<\/code><\/h3>\n<p>Legacy values are still accepted and mapped:<\/p>\n<ul>\n<li><code>5m<\/code> -&gt; <code>short<\/code><\/li>\n<li><code>1h<\/code> -&gt; <code>long<\/code><\/li>\n<\/ul>\n<p>Prefer <code>cacheRetention<\/code> for new config.<\/p>\n<h3><code>contextPruning.mode: \"cache-ttl\"<\/code><\/h3>\n<p>Prunes old tool-result context after cache TTL windows so post-idle requests do not re-cache oversized history.<\/p>\n<p>&#8220;`yaml  theme={&#8220;theme&#8221;:{&#8220;light&#8221;:&#8221;min-light&#8221;,&#8221;dark&#8221;:&#8221;min-dark&#8221;}}<br \/>\nagents:<br \/>\n  defaults:<br \/>\n    contextPruning:<br \/>\n      mode: &#8220;cache-ttl&#8221;<br \/>\n      ttl: &#8220;1h&#8221;<\/p>\n<pre><code>\nSee [Session Pruning](\/concepts\/session-pruning) for full behavior.\n\n### Heartbeat keep-warm\n\nHeartbeat can keep cache windows warm and reduce repeated cache writes after idle gaps.\n\n```yaml  theme={&quot;theme&quot;:{&quot;light&quot;:&quot;min-light&quot;,&quot;dark&quot;:&quot;min-dark&quot;}}\nagents:\n  defaults:\n    heartbeat:\n      every: &quot;55m&quot;\n<\/code><\/pre>\n<p>Per-agent heartbeat is supported at <code>agents.list[].heartbeat<\/code>.<\/p>\n<h2>Provider behavior<\/h2>\n<h3>Anthropic (direct API)<\/h3>\n<ul>\n<li><code>cacheRetention<\/code> is supported.<\/li>\n<li>With Anthropic API-key auth profiles, OpenClaw seeds <code>cacheRetention: \"short\"<\/code> for Anthropic model refs when unset.<\/li>\n<\/ul>\n<h3>Amazon Bedrock<\/h3>\n<ul>\n<li>Anthropic Claude model refs (<code>amazon-bedrock\/*anthropic.claude*<\/code>) support explicit <code>cacheRetention<\/code> pass-through.<\/li>\n<li>Non-Anthropic Bedrock models are forced to <code>cacheRetention: \"none\"<\/code> at runtime.<\/li>\n<\/ul>\n<h3>OpenRouter Anthropic models<\/h3>\n<p>For <code>openrouter\/anthropic\/*<\/code> model refs, OpenClaw injects Anthropic <code>cache_control<\/code> on system\/developer prompt blocks to improve prompt-cache reuse.<\/p>\n<h3>Other providers<\/h3>\n<p>If the provider does not support this cache mode, <code>cacheRetention<\/code> has no effect.<\/p>\n<h2>Tuning patterns<\/h2>\n<h3>Mixed traffic (recommended default)<\/h3>\n<p>Keep a long-lived baseline on your main agent, disable caching on bursty notifier agents:<\/p>\n<p>&#8220;`yaml  theme={&#8220;theme&#8221;:{&#8220;light&#8221;:&#8221;min-light&#8221;,&#8221;dark&#8221;:&#8221;min-dark&#8221;}}<br \/>\nagents:<br \/>\n  defaults:<br \/>\n    model:<br \/>\n      primary: &#8220;anthropic\/claude-opus-4-6&#8221;<br \/>\n    models:<br \/>\n      &#8220;anthropic\/claude-opus-4-6&#8221;:<br \/>\n        params:<br \/>\n          cacheRetention: &#8220;long&#8221;<br \/>\n  list:<br \/>\n    &#8211; id: &#8220;research&#8221;<br \/>\n      default: true<br \/>\n      heartbeat:<br \/>\n        every: &#8220;55m&#8221;<br \/>\n    &#8211; id: &#8220;alerts&#8221;<br \/>\n      params:<br \/>\n        cacheRetention: &#8220;none&#8221;<\/p>\n<pre><code>\n### Cost-first baseline\n\n* Set baseline `cacheRetention: &quot;short&quot;`.\n* Enable `contextPruning.mode: &quot;cache-ttl&quot;`.\n* Keep heartbeat below your TTL only for agents that benefit from warm caches.\n\n## Cache diagnostics\n\nOpenClaw exposes dedicated cache-trace diagnostics for embedded agent runs.\n\n### `diagnostics.cacheTrace` config\n\n```yaml  theme={&quot;theme&quot;:{&quot;light&quot;:&quot;min-light&quot;,&quot;dark&quot;:&quot;min-dark&quot;}}\ndiagnostics:\n  cacheTrace:\n    enabled: true\n    filePath: &quot;~\/.openclaw\/logs\/cache-trace.jsonl&quot; # optional\n    includeMessages: false # default true\n    includePrompt: false # default true\n    includeSystem: false # default true\n<\/code><\/pre>\n<p>Defaults:<\/p>\n<ul>\n<li><code>filePath<\/code>: <code>$OPENCLAW_STATE_DIR\/logs\/cache-trace.jsonl<\/code><\/li>\n<li><code>includeMessages<\/code>: <code>true<\/code><\/li>\n<li><code>includePrompt<\/code>: <code>true<\/code><\/li>\n<li><code>includeSystem<\/code>: <code>true<\/code><\/li>\n<\/ul>\n<h3>Env toggles (one-off debugging)<\/h3>\n<ul>\n<li><code>OPENCLAW_CACHE_TRACE=1<\/code> enables cache tracing.<\/li>\n<li><code>OPENCLAW_CACHE_TRACE_FILE=\/path\/to\/cache-trace.jsonl<\/code> overrides output path.<\/li>\n<li><code>OPENCLAW_CACHE_TRACE_MESSAGES=0|1<\/code> toggles full message payload capture.<\/li>\n<li><code>OPENCLAW_CACHE_TRACE_PROMPT=0|1<\/code> toggles prompt text capture.<\/li>\n<li><code>OPENCLAW_CACHE_TRACE_SYSTEM=0|1<\/code> toggles system prompt capture.<\/li>\n<\/ul>\n<h3>What to inspect<\/h3>\n<ul>\n<li>Cache trace events are JSONL and include staged snapshots like <code>session:loaded<\/code>, <code>prompt:before<\/code>, <code>stream:context<\/code>, and <code>session:after<\/code>.<\/li>\n<li>Per-turn cache token impact is visible in normal usage surfaces via <code>cacheRead<\/code> and <code>cacheWrite<\/code> (for example <code>\/usage full<\/code> and session usage summaries).<\/li>\n<\/ul>\n<h2>Quick troubleshooting<\/h2>\n<ul>\n<li>High <code>cacheWrite<\/code> on most turns: check for volatile system-prompt inputs and verify model\/provider supports your cache settings.<\/li>\n<li>No effect from <code>cacheRetention<\/code>: confirm model key matches <code>agents.defaults.models[\"provider\/model\"]<\/code>.<\/li>\n<li>Bedrock Nova\/Mistral requests with cache settings: expected runtime force to <code>none<\/code>.<\/li>\n<\/ul>\n<p>Related docs:<\/p>\n<ul>\n<li><a href=\"\/providers\/anthropic\">Anthropic<\/a><\/li>\n<li><a href=\"\/reference\/token-use\">Token Use and Costs<\/a><\/li>\n<li><a href=\"\/concepts\/session-pruning\">Session Pruning<\/a><\/li>\n<li><a href=\"\/gateway\/configuration-reference\">Gateway Configuration Reference<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Prompt Caching Prompt caching Prompt caching means the  [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-634","post","type-post","status-publish","format-standard","hentry","category-docs"],"_links":{"self":[{"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/posts\/634","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/comments?post=634"}],"version-history":[{"count":2,"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/posts\/634\/revisions"}],"predecessor-version":[{"id":711,"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/posts\/634\/revisions\/711"}],"wp:attachment":[{"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/media?parent=634"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/categories?post=634"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pa.yingzhi8.cn\/index.php\/wp-json\/wp\/v2\/tags?post=634"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}