# Headroom

> Context compression layer for LLM applications that compresses tool outputs, logs, RAG chunks, and files before they reach the model, delivering 60–95% fewer tokens with the same answers.

Headroom is an open-source context optimization library, proxy, and MCP server for LLM applications, published under the Apache 2.0 license by developer chopratejas. It intercepts everything an AI agent reads — tool outputs, database results, file reads, RAG chunks, and conversation history — and compresses it before it reaches the model, targeting 60–95% token reduction while preserving answer quality. The project is available on GitHub and installable via PyPI (`headroom-ai`) and npm (`headroom-ai`).

## What It Is

Headroom sits between your application or agent and the LLM provider as a context compression layer. It routes content through specialized compressors — SmartCrusher for JSON, CodeCompressor for AST-aware code, and the Kompress-v2-base HuggingFace model for prose — then forwards the compressed prompt to any OpenAI-compatible or Anthropic endpoint. It runs entirely locally, so data never leaves the machine. The project also ships a reversible compression mode (CCR) that caches originals for on-demand retrieval, cross-agent shared memory, and a `headroom learn` command that mines failed sessions and writes corrections to `CLAUDE.md` / `AGENTS.md`.

## Deployment Modes

Headroom offers four distinct integration paths:

- **Library** — `compress(messages)` inline in Python or TypeScript
- **Proxy** — `headroom proxy --port 8787`, zero code changes, any language or framework
- **Agent wrap** — `headroom wrap claude|codex|cursor|aider|copilot` wraps a coding agent in one command
- **MCP server** — exposes `headroom_compress`, `headroom_retrieve`, and `headroom_stats` tools to any MCP client

## Compression Architecture

The internal pipeline routes each request through a `ContentRouter` that detects content type and selects the appropriate compressor. A `CacheAligner` stabilizes prompt prefixes so provider KV caches actually hit. The six algorithms cover JSON arrays and nested objects (SmartCrusher), Python/JS/Go/Rust/Java/C++ source (CodeCompressor), prose and agentic traces (Kompress-base), and images (ML router). The CCR layer stores originals locally and lets the LLM call `headroom_retrieve` if it needs the full content within the configured TTL.

## Benchmark Evidence

The README publishes savings on real agent workloads: code search (100 results) drops from 17,765 to 1,408 tokens (92% reduction); SRE incident debugging from 65,694 to 5,118 tokens (92%); GitHub issue triage from 54,174 to 14,761 tokens (73%). Accuracy benchmarks on GSM8K (math), TruthfulQA (factual), SQuAD v2 (QA), and BFCL (tool calls) show no meaningful degradation at those compression levels. These figures are vendor-published and reproducible via `python -m headroom.evals suite --tier 1`.

## Update: v0.25.0

The latest release is v0.25.0, published on 2026-06-12. The repository was last pushed on 2026-06-13 and shows active development with 25,785 stars and 1,705 forks on GitHub. The project supports Python 3.10+ and ships granular install extras including `[proxy]`, `[mcp]`, `[ml]`, `[code]`, `[memory]`, `[image]`, `[agno]`, `[langchain]`, and `[pytorch-mps]` for Apple-GPU memory-embedder offload.

## Tradeoffs to Know

Headroom requires a local process to run, making it unsuitable for fully sandboxed environments. It is not a replacement for provider-native compaction when only conversation history needs trimming and no cross-agent memory is required. The `headroom wrap copilot` subscription mode for GitHub Copilot CLI has been smoke-tested on macOS; Windows Credential Manager and Linux Secret Service paths are implemented but not yet fully validated according to the README.

## Features
- Context compression (60–95% token reduction)
- SmartCrusher for JSON compression
- CodeCompressor for AST-aware code compression
- Kompress-v2-base HuggingFace model for prose
- Image compression via ML router
- Reversible compression (CCR) with local caching
- Drop-in proxy mode (zero code changes)
- Agent wrap for Claude Code, Codex, Cursor, Aider, Copilot
- MCP server with headroom_compress, headroom_retrieve, headroom_stats
- Cross-agent shared memory with auto-dedup
- CacheAligner for KV cache optimization
- headroom learn for failure mining and correction writing
- SharedContext for multi-agent workflows
- ASGI middleware support
- Local-first — data never leaves the machine
- Python and TypeScript/Node SDKs
- Docker image available

## Integrations
Anthropic Claude, OpenAI, Vercel AI SDK, LangChain, LiteLLM, Agno, Strands, Claude Code, Codex, Cursor, Aider, GitHub Copilot CLI, OpenClaw, Amazon Bedrock, HuggingFace, FastAPI, MCP clients, Qdrant, Neo4j

## Platforms
API, CLI, DEVELOPER_SDK

## Pricing
Open Source

## Version
v0.25.0

## Links
- Website: https://headroom-docs.vercel.app/
- Documentation: https://headroom-docs.vercel.app/docs
- Repository: https://github.com/chopratejas/headroom
- EveryDev.ai: https://www.everydev.ai/tools/headroom