Interactive · K-Pruning

The K-Pruning Stub Lifecycle

Step through exactly how an idle tool-result block becomes a stub, what the model reads versus what SQLite stores, and how cachelane:expand brings it back without ever losing a byte. Use the stage buttons, or your arrow keys, to play along.

Stage 1 of 9 · User prompt

The user asks a question

A developer types a request into Claude Code. No tool result exists yet. This user turn is what will trigger a tool call. Watch how this one block is born, ages, is stubbed, then restored.

① What the human readsfull scrollable history

What the user actually sees. The agent answers in plain English; tool calls and stubbing stay hidden.

USER
Please read auth.ts and summarize the function.
② Model promptstructured content slot
{
  "role": "user",
  "content": [
    { "type": "text",
      "text": "Please read auth.ts and summarize the function." }
  ]
}
③ SQLite metadata rowblocks table

No block row exists yet.

④ Cache statuslive prompt vs metadata store
n/a
⑤ Token impactbefore → after

The auth.ts tool-result block does not exist in the prompt yet.

Stage 1 / 9 · User prompt

The mental model

Who actually runs the show?

The most important thing to understand about K-pruning is that CacheLane never reads the user's mind, and never connects one question to another. It does something far simpler, and the model's own intelligence does the rest.

The model is the orchestrator

When a stub is created, CacheLane leaves a literal sticky-note in the conversation: [stub:01KPRUNE] … | refetch via cachelane_expand(block_id=01KPRUNE). That note stays in the model's context on every turn, just tiny now. When the user later asks “what did auth.ts look like again?”, phrased completely differently from the original request, the model reads both the new question and the stub, recognizes they're about the same thing, and decides on its own to call cachelane:expand. No NLP, no semantic matching, no intent detection lives in CacheLane. The model's reasoning bridges the two questions.

CacheLane is the infrastructure

CacheLane's job is mechanical and deterministic: watch the idle counter, replace the block with a descriptive stub when it crosses K, expose the cachelane:expand tool, and, when that tool is called, re-run the original deterministic tool call to restore the content. It makes the prompt smaller, and leaves a recoverable pointer. Rather than building fragile, model-specific intent detection into the system, the reasoning is offloaded to the model's native capability. The stub just has to be descriptive enough that any sufficiently capable model knows what to do with it.

The model's capability is the bottleneck

Because the model drives the restore flow, CacheLane's effectiveness scales with model quality. Not in the obvious “raw intelligence” way, but in context fidelity under pressure. Three failure modes:

  • 1.Context saturation. In a long enough session, the stub can fall outside the model's effective attention window, so it reads right past the refetch hint.
  • 2.Instruction-following drift. The stub is an embedded instruction, and models follow those less reliably as sessions grow.
  • 3.Tool selection under ambiguity. On a vague question the model might answer from memory or issue a fresh read instead of calling cachelane:expand.

On a strong model it is essentially self-managing, with stubs noticed and acted on reliably. On a weaker model you'd see more silent misses. That is the deliberate trade-off: the system's correctness ceiling is set by the model, not by CacheLane's own logic.

Token saving → CacheLane's job (happens at stubbing, ongoing, no intent needed)
Re-fetching → the model's job (reads the stub, decides to restore, no NLP needed)

Built by

Aditya Tripuraneni & Rajan Chavada

CacheLane, a local, fail-open caching middleware for Claude Code.