Architecture
Technical Flow & Pruning
Deep dive into CacheLane's core caching strategies, K-pruning logic, and request cycle interception.
Prompt Volatility Classification
Anthropic prompt caching is **prefix-based**. Any change in a prefix invalidates all cache data after that point. CacheLane optimizes for this behavior by decomposing your prompt into atomic blocks, classifying them by volatility, and sorting them:
- STABLE Region (Base Cache): Contains slow-changing contexts (System instructions, MCP tool definitions, pinned rules, CLAUDE.md). Bounded by the first
cache_controlbreakpoint. - SEMI Region (Mid-Cache): Holds dialogue history turns. These shift on each turn, but follow a predictable FIFO structure. Bounded by the second
cache_controlbreakpoint. - VOLATILE Region (Paid Full): Ephemeral workspace contexts, latest tool call outputs, and the current user question.
Interception Hook Pipeline
CacheLane deploys as a middleware proxy that intercepts Claude Code traffic. The request pipeline executes in two primary phases:
- PreRequest Interception (Turn Start): Intercepts outgoing requests, queries database stats, groups prompt content blocks, prunes unreferenced blocks, places breakpoints, and forwards the payload to Anthropic.
- PostResponse Interception (Turn End): Monitors incoming responses, extracts model references to blocks (e.g. file paths in tool calls or block ID mentions), logs turn token costs, and updates block idle counters.
Trajectory-Aware K-Pruning
In long sessions, tool outputs and file reads bloat the context window. CacheLane's **K-pruner** automatically manages this:
- Each turn a block goes unreferenced by the model's response, its
unused_turnscount increments. - When
unused_turns ≥ K(defaultK=3), the block's text content is discarded and replaced with a compact stub containing its unique identifier, a brief summary, and a refetch command. - If Claude decides it needs the block again, it issues a call to the
cachelane:expandMCP tool. CacheLane intercepts this tool call, restores the block from database references, and inserts it back into the suffix.
Keepalive Scheduler
Anthropic prompt caches expire after **5 minutes** of idle time. During long pauses, users face full write cost penalties.
CacheLane spawns a light keepalive scheduler in the background. When the user is idle, it issues minimal synthetic API queries (max_tokens=1 with the identical prefix structure) at 4-minute intervals, resetting the TTL and keeping the cache warm.