Headroom
Context compression layer for LLM applications that compresses tool outputs, logs, RAG chunks, and files before they reach the model, delivering 60–95% fewer tokens with the same answers.
At a Glance
About Headroom
Headroom is an open-source context optimization library, proxy, and MCP server for LLM applications, published under the Apache 2.0 license by developer chopratejas. It intercepts everything an AI agent reads — tool outputs, database results, file reads, RAG chunks, and conversation history — and compresses it before it reaches the model, targeting 60–95% token reduction while preserving answer quality. The project is available on GitHub and installable via PyPI (headroom-ai) and npm (headroom-ai).
What It Is
Headroom sits between your application or agent and the LLM provider as a context compression layer. It routes content through specialized compressors — SmartCrusher for JSON, CodeCompressor for AST-aware code, and the Kompress-v2-base HuggingFace model for prose — then forwards the compressed prompt to any OpenAI-compatible or Anthropic endpoint. It runs entirely locally, so data never leaves the machine. The project also ships a reversible compression mode (CCR) that caches originals for on-demand retrieval, cross-agent shared memory, and a headroom learn command that mines failed sessions and writes corrections to CLAUDE.md / AGENTS.md.
Deployment Modes
Headroom offers four distinct integration paths:
- Library —
compress(messages)inline in Python or TypeScript - Proxy —
headroom proxy --port 8787, zero code changes, any language or framework - Agent wrap —
headroom wrap claude|codex|cursor|aider|copilotwraps a coding agent in one command - MCP server — exposes
headroom_compress,headroom_retrieve, andheadroom_statstools to any MCP client
Compression Architecture
The internal pipeline routes each request through a ContentRouter that detects content type and selects the appropriate compressor. A CacheAligner stabilizes prompt prefixes so provider KV caches actually hit. The six algorithms cover JSON arrays and nested objects (SmartCrusher), Python/JS/Go/Rust/Java/C++ source (CodeCompressor), prose and agentic traces (Kompress-base), and images (ML router). The CCR layer stores originals locally and lets the LLM call headroom_retrieve if it needs the full content within the configured TTL.
Benchmark Evidence
The README publishes savings on real agent workloads: code search (100 results) drops from 17,765 to 1,408 tokens (92% reduction); SRE incident debugging from 65,694 to 5,118 tokens (92%); GitHub issue triage from 54,174 to 14,761 tokens (73%). Accuracy benchmarks on GSM8K (math), TruthfulQA (factual), SQuAD v2 (QA), and BFCL (tool calls) show no meaningful degradation at those compression levels. These figures are vendor-published and reproducible via python -m headroom.evals suite --tier 1.
Update: v0.25.0
The latest release is v0.25.0, published on 2026-06-12. The repository was last pushed on 2026-06-13 and shows active development with 25,785 stars and 1,705 forks on GitHub. The project supports Python 3.10+ and ships granular install extras including [proxy], [mcp], [ml], [code], [memory], [image], [agno], [langchain], and [pytorch-mps] for Apple-GPU memory-embedder offload.
Tradeoffs to Know
Headroom requires a local process to run, making it unsuitable for fully sandboxed environments. It is not a replacement for provider-native compaction when only conversation history needs trimming and no cross-agent memory is required. The headroom wrap copilot subscription mode for GitHub Copilot CLI has been smoke-tested on macOS; Windows Credential Manager and Linux Secret Service paths are implemented but not yet fully validated according to the README.
Community Discussions
Be the first to start a conversation about Headroom
Share your experience with Headroom, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully open-source under Apache 2.0 — free to use, modify, and distribute.
- Full context compression library
- Proxy mode
- MCP server
- Agent wrap (Claude Code, Codex, Cursor, Aider, Copilot)
- Cross-agent shared memory
Capabilities
Key Features
- Context compression (60–95% token reduction)
- SmartCrusher for JSON compression
- CodeCompressor for AST-aware code compression
- Kompress-v2-base HuggingFace model for prose
- Image compression via ML router
- Reversible compression (CCR) with local caching
- Drop-in proxy mode (zero code changes)
- Agent wrap for Claude Code, Codex, Cursor, Aider, Copilot
- MCP server with headroom_compress, headroom_retrieve, headroom_stats
- Cross-agent shared memory with auto-dedup
- CacheAligner for KV cache optimization
- headroom learn for failure mining and correction writing
- SharedContext for multi-agent workflows
- ASGI middleware support
- Local-first — data never leaves the machine
- Python and TypeScript/Node SDKs
- Docker image available
