Compresr
Compresr compresses LLM context to reduce token costs, improve accuracy, and cut latency in AI pipelines using query-aware and query-agnostic compression models.
At a Glance
Pricing
Engagement
Available On
Listed Mar 2026
About Compresr
Compresr is a context compression API for LLM pipelines that reduces token usage by up to 200x without quality loss. It offers multiple compression models — from query-agnostic pre-compression to aggressive query-specific filtering — enabling developers to cut costs, reduce latency, and improve accuracy in RAG, search, Q&A, and agent workflows. Backed by Y Combinator (W26), Compresr also provides an open-source Context Gateway for agent frameworks like Claude Code and OpenClaw.
- Espresso V1 — Query-agnostic token-level compression; pre-compress long documents, system prompts, or agent histories once and reuse across multiple queries.
- Latte V1 — Query-specific token-level compression that retains only tokens relevant to a given question, enabling up to 200x compression; ideal for RAG, search, and Q&A pipelines.
- Coldbrew V1 — Query-specific chunk-level filtering that drops entire irrelevant chunks before they reach the model; ideal for structured data like transcripts or logs.
- Context Gateway — Open-source proxy for agents that compresses conversation history, tool outputs, and tool lists; compatible with Claude Code, OpenClaw, Codex, and more.
- Coarse-grained filtering — Pass a query and a list of chunks to retrieve only the relevant ones, reducing context before fine-grained compression.
- Fine-grained compression — Token-level compression given a query and context, stripping irrelevant tokens for maximum efficiency.
- Usage-based pricing — Pay per million tokens processed, with no upfront commitment; get started by signing up and calling the API.
- Demo environment — Try compression live on sample datasets like SEC filings directly from the dashboard without writing any code.
Community Discussions
Be the first to start a conversation about Compresr
Share your experience with Compresr, ask questions, or help others learn from your insights.
Pricing
Espresso V1
Query-agnostic token-level compression. Pre-compress long documents, system prompts, or agent histories once and reuse across multiple queries.
- Query-agnostic token-level compression
- Pre-compress long documents
- Compress system prompts
- Compress agent histories
- Reuse compressed context across multiple queries
Latte V1
Query-specific token-level compression. Retains only tokens relevant to a given question, enabling aggressive (up to 200x) compression. Ideal for RAG, search, and Q&A pipelines.
- Query-specific token-level compression
- Up to 200x compression ratio
- Ideal for RAG pipelines
- Ideal for search and Q&A workflows
Coldbrew V1
Query-specific chunk-level filtering. Drops entire irrelevant chunks before they reach the model — ideal for structured data like transcripts or logs.
- Query-specific chunk-level filtering
- Drops irrelevant chunks before model inference
- Ideal for transcripts and logs
- No retrieval index required
Capabilities
Key Features
- Token-level context compression
- Chunk-level context filtering
- Query-agnostic pre-compression
- Query-specific compression
- Up to 200x compression ratio
- Context Gateway for agents
- Conversation history compression
- Tool output compression
- RAG pipeline optimization
- Open-source proxy gateway
- REST API access
- Dashboard demo environment
