Architecture

Technical Flow & Pruning

Deep dive into CacheLane's core caching strategies, K-pruning logic, and request cycle interception.

Prompt Volatility Classification

Anthropic prompt caching is **prefix-based**. Any change in a prefix invalidates all cache data after that point. CacheLane optimizes for this behavior by decomposing your prompt into atomic blocks, classifying them by volatility, and sorting them:

STABLE Region (Base Cache): Contains slow-changing contexts (System instructions, MCP tool definitions, pinned rules, CLAUDE.md). Bounded by the first cache_control breakpoint.
SEMI Region (Mid-Cache): Holds dialogue history turns. These shift on each turn, but follow a predictable FIFO structure. Bounded by the second cache_control breakpoint.
VOLATILE Region (Paid Full): Ephemeral workspace contexts, latest tool call outputs, and the current user question.

Interception Hook Pipeline

CacheLane deploys as a middleware proxy that intercepts Claude Code traffic. The request pipeline executes in two primary phases:

PreRequest Interception (Turn Start): Intercepts outgoing requests, queries database stats, groups prompt content blocks, prunes unreferenced blocks, places breakpoints, and forwards the payload to Anthropic.
PostResponse Interception (Turn End): Monitors incoming responses, extracts model references to blocks (e.g. file paths in tool calls or block ID mentions), logs turn token costs, and updates block idle counters.

Trajectory-Aware K-Pruning

In long sessions, tool outputs and file reads bloat the context window. CacheLane's **K-pruner** automatically manages this:

Each turn a block goes unreferenced by the model's response, its unused_turns count increments.
When unused_turns ≥ K (default K=3), the block's text content is discarded and replaced with a compact stub containing its unique identifier, a brief summary, and a refetch command.
If Claude decides it needs the block again, it issues a call to the cachelane:expand MCP tool. CacheLane intercepts this tool call, restores the block from database references, and inserts it back into the suffix.

Keepalive Scheduler

Anthropic prompt caches expire after **5 minutes** of idle time. During long pauses, users face full write cost penalties.

CacheLane spawns a light keepalive scheduler in the background. When the user is idle, it issues minimal synthetic API queries (max_tokens=1 with the identical prefix structure) at 4-minute intervals, resetting the TTL and keeping the cache warm.