Step through exactly how an idle tool-result block becomes a stub, what the model reads versus what SQLite stores, and how cachelane:expand brings it back without ever losing a byte. Use the stage buttons, or your arrow keys, to play along.
Stage 1 of 9 · User prompt
A developer types a request into Claude Code. No tool result exists yet. This user turn is what will trigger a tool call. Watch how this one block is born, ages, is stubbed, then restored.
What the user actually sees. The agent answers in plain English; tool calls and stubbing stay hidden.
{ "role": "user", "content": [ { "type": "text", "text": "Please read auth.ts and summarize the function." } ] }
No block row exists yet.
The auth.ts tool-result block does not exist in the prompt yet.
The mental model
The most important thing to understand about K-pruning is that CacheLane never reads the user's mind, and never connects one question to another. It does something far simpler, and the model's own intelligence does the rest.
When a stub is created, CacheLane leaves a literal sticky-note in the conversation: [stub:01KPRUNE] … | refetch via cachelane_expand(block_id=01KPRUNE). That note stays in the model's context on every turn, just tiny now. When the user later asks “what did auth.ts look like again?”, phrased completely differently from the original request, the model reads both the new question and the stub, recognizes they're about the same thing, and decides on its own to call cachelane:expand. No NLP, no semantic matching, no intent detection lives in CacheLane. The model's reasoning bridges the two questions.
CacheLane's job is mechanical and deterministic: watch the idle counter, replace the block with a descriptive stub when it crosses K, expose the cachelane:expand tool, and, when that tool is called, re-run the original deterministic tool call to restore the content. It makes the prompt smaller, and leaves a recoverable pointer. Rather than building fragile, model-specific intent detection into the system, the reasoning is offloaded to the model's native capability. The stub just has to be descriptive enough that any sufficiently capable model knows what to do with it.
Because the model drives the restore flow, CacheLane's effectiveness scales with model quality. Not in the obvious “raw intelligence” way, but in context fidelity under pressure. Three failure modes:
cachelane:expand.On a strong model it is essentially self-managing, with stubs noticed and acted on reliably. On a weaker model you'd see more silent misses. That is the deliberate trade-off: the system's correctness ceiling is set by the model, not by CacheLane's own logic.
Token saving → CacheLane's job (happens at stubbing, ongoing, no intent needed)
Re-fetching → the model's job (reads the stub, decides to restore, no NLP needed)
Built by
Aditya Tripuraneni & Rajan Chavada
CacheLane, a local, fail-open caching middleware for Claude Code.