Headroom

Name: Headroom
Availability: OnlineOnly
Author: chopratejas

Context compression layer for LLM applications that compresses tool outputs, logs, RAG chunks, and files before they reach the model, delivering 60–95% fewer tokens with the same answers.

Visit Website

At a Glance

Pricing

Open Source

Fully open-source under Apache 2.0 — free to use, modify, and distribute.

Engagement

Available On

API

CLI

SDK

chopratejasSan Jose, CAEst. 2025

Listed Jun 2026

About Headroom

Headroom is an open-source context optimization library, proxy, and MCP server for LLM applications, published under the Apache 2.0 license by developer chopratejas. It intercepts everything an AI agent reads — tool outputs, database results, file reads, RAG chunks, and conversation history — and compresses it before it reaches the model, targeting 60–95% token reduction while preserving answer quality. The project is available on GitHub and installable via PyPI (headroom-ai) and npm (headroom-ai).

What It Is

Headroom sits between your application or agent and the LLM provider as a context compression layer. It routes content through specialized compressors — SmartCrusher for JSON, CodeCompressor for AST-aware code, and the Kompress-v2-base HuggingFace model for prose — then forwards the compressed prompt to any OpenAI-compatible or Anthropic endpoint. It runs entirely locally, so data never leaves the machine. The project also ships a reversible compression mode (CCR) that caches originals for on-demand retrieval, cross-agent shared memory, and a headroom learn command that mines failed sessions and writes corrections to CLAUDE.md / AGENTS.md.

Deployment Modes

Headroom offers four distinct integration paths:

Library — compress(messages) inline in Python or TypeScript
Proxy — headroom proxy --port 8787, zero code changes, any language or framework
Agent wrap — headroom wrap claude|codex|cursor|aider|copilot wraps a coding agent in one command
MCP server — exposes headroom_compress, headroom_retrieve, and headroom_stats tools to any MCP client

Compression Architecture

The internal pipeline routes each request through a ContentRouter that detects content type and selects the appropriate compressor. A CacheAligner stabilizes prompt prefixes so provider KV caches actually hit. The six algorithms cover JSON arrays and nested objects (SmartCrusher), Python/JS/Go/Rust/Java/C++ source (CodeCompressor), prose and agentic traces (Kompress-base), and images (ML router). The CCR layer stores originals locally and lets the LLM call headroom_retrieve if it needs the full content within the configured TTL.

Benchmark Evidence

The README publishes savings on real agent workloads: code search (100 results) drops from 17,765 to 1,408 tokens (92% reduction); SRE incident debugging from 65,694 to 5,118 tokens (92%); GitHub issue triage from 54,174 to 14,761 tokens (73%). Accuracy benchmarks on GSM8K (math), TruthfulQA (factual), SQuAD v2 (QA), and BFCL (tool calls) show no meaningful degradation at those compression levels. These figures are vendor-published and reproducible via python -m headroom.evals suite --tier 1.

Update: v0.25.0

The latest release is v0.25.0, published on 2026-06-12. The repository was last pushed on 2026-06-13 and shows active development with 25,785 stars and 1,705 forks on GitHub. The project supports Python 3.10+ and ships granular install extras including [proxy], [mcp], [ml], [code], [memory], [image], [agno], [langchain], and [pytorch-mps] for Apple-GPU memory-embedder offload.

Tradeoffs to Know

Headroom requires a local process to run, making it unsuitable for fully sandboxed environments. It is not a replacement for provider-native compaction when only conversation history needs trimming and no cross-agent memory is required. The headroom wrap copilot subscription mode for GitHub Copilot CLI has been smoke-tested on macOS; Windows Credential Manager and Linux Secret Service paths are implemented but not yet fully validated according to the README.

Community Discussions

Be the first to start a conversation about Headroom

Share your experience with Headroom, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully open-source under Apache 2.0 — free to use, modify, and distribute.

Full context compression library
Proxy mode
MCP server
Agent wrap (Claude Code, Codex, Cursor, Aider, Copilot)
Cross-agent shared memory

Capabilities

Key Features

Context compression (60–95% token reduction)
SmartCrusher for JSON compression
CodeCompressor for AST-aware code compression
Kompress-v2-base HuggingFace model for prose
Image compression via ML router
Reversible compression (CCR) with local caching
Drop-in proxy mode (zero code changes)
Agent wrap for Claude Code, Codex, Cursor, Aider, Copilot
MCP server with headroom_compress, headroom_retrieve, headroom_stats
Cross-agent shared memory with auto-dedup
CacheAligner for KV cache optimization
headroom learn for failure mining and correction writing
SharedContext for multi-agent workflows
ASGI middleware support
Local-first — data never leaves the machine
Python and TypeScript/Node SDKs
Docker image available

Integrations

Anthropic Claude

OpenAI

Vercel AI SDK

LangChain

LiteLLM

Agno

Strands

Claude Code

Codex

Cursor

Aider

GitHub Copilot CLI

OpenClaw

Amazon Bedrock

HuggingFace

FastAPI

MCP clients

Qdrant

Neo4j

API Available

View Docs

Back to all tools Suggest an edit

About Headroom

What It Is

Deployment Modes

Headroom offers four distinct integration paths:

Library — compress(messages) inline in Python or TypeScript
Proxy — headroom proxy --port 8787, zero code changes, any language or framework
Agent wrap — headroom wrap claude|codex|cursor|aider|copilot wraps a coding agent in one command
MCP server — exposes headroom_compress, headroom_retrieve, and headroom_stats tools to any MCP client

Headroom