Compresr

Name: Compresr
Price: 0.10 USD
Availability: OnlineOnly
Author: Compresr

Compresr compresses LLM context to reduce token costs, improve accuracy, and cut latency in AI pipelines using query-aware and query-agnostic compression models.

Visit Website

At a Glance

Pricing

Paid

Espresso V1: $0 usage-based

Latte V1: $0.25 usage-based

Coldbrew V1: $0.25 usage-based

Engagement

Available On

macOS

Web

API

CompresrSan Francisco, CAEst. 2026$500000 raised

Listed Mar 2026

About Compresr

Compresr is a context compression API for LLM pipelines that reduces token usage by up to 200x without quality loss. It offers multiple compression models — from query-agnostic pre-compression to aggressive query-specific filtering — enabling developers to cut costs, reduce latency, and improve accuracy in RAG, search, Q&A, and agent workflows. Backed by Y Combinator (W26), Compresr also provides an open-source Context Gateway for agent frameworks like Claude Code and OpenClaw.

Espresso V1 — Query-agnostic token-level compression; pre-compress long documents, system prompts, or agent histories once and reuse across multiple queries.
Latte V1 — Query-specific token-level compression that retains only tokens relevant to a given question, enabling up to 200x compression; ideal for RAG, search, and Q&A pipelines.
Coldbrew V1 — Query-specific chunk-level filtering that drops entire irrelevant chunks before they reach the model; ideal for structured data like transcripts or logs.
Context Gateway — Open-source proxy for agents that compresses conversation history, tool outputs, and tool lists; compatible with Claude Code, OpenClaw, Codex, and more.
Coarse-grained filtering — Pass a query and a list of chunks to retrieve only the relevant ones, reducing context before fine-grained compression.
Fine-grained compression — Token-level compression given a query and context, stripping irrelevant tokens for maximum efficiency.
Usage-based pricing — Pay per million tokens processed, with no upfront commitment; get started by signing up and calling the API.
Demo environment — Try compression live on sample datasets like SEC filings directly from the dashboard without writing any code.

Community Discussions

Be the first to start a conversation about Compresr

Share your experience with Compresr, ask questions, or help others learn from your insights.

Pricing

Espresso V1

Query-agnostic token-level compression. Pre-compress long documents, system prompts, or agent histories once and reuse across multiple queries.

usage based

Query-agnostic token-level compression
Pre-compress long documents
Compress system prompts
Compress agent histories
Reuse compressed context across multiple queries

Latte V1

Query-specific token-level compression. Retains only tokens relevant to a given question, enabling aggressive (up to 200x) compression. Ideal for RAG, search, and Q&A pipelines.

$0.25

usage based

Query-specific token-level compression
Up to 200x compression ratio
Ideal for RAG pipelines
Ideal for search and Q&A workflows

Coldbrew V1

Query-specific chunk-level filtering. Drops entire irrelevant chunks before they reach the model — ideal for structured data like transcripts or logs.

$0.25

usage based

Query-specific chunk-level filtering
Drops irrelevant chunks before model inference
Ideal for transcripts and logs
No retrieval index required

View official pricing

Capabilities

Key Features

Token-level context compression
Chunk-level context filtering
Query-agnostic pre-compression
Query-specific compression
Up to 200x compression ratio
Context Gateway for agents
Conversation history compression
Tool output compression
RAG pipeline optimization
Open-source proxy gateway
REST API access
Dashboard demo environment

Integrations

Claude Code

OpenClaw

Codex

GPT models

RAG pipelines

API Available

View Docs

Back to all tools

Compresr

Context Engineering

Compresr compresses LLM context to reduce token costs, improve accuracy, and cut latency in AI pipelines using query-aware and query-agnostic compression models.

Visit Website

At a Glance

Pricing

Paid

Espresso V1: $0 usage-based

Latte V1: $0.25 usage-based

Coldbrew V1: $0.25 usage-based

Engagement

27views

Discussions

Available On

macOS

Web

API

Resources

Website Docs GitHub llms.txt

Topics

Context Engineering LLM Orchestration Retrieval-Augmented Generation

Alternatives

Context-Gateway Cartographer 12-Factor Agents

Developer

CompresrSan Francisco, CAEst. 2026$500000 raised

Listed Mar 2026

About Compresr

Espresso V1 — Query-agnostic token-level compression; pre-compress long documents, system prompts, or agent histories once and reuse across multiple queries.
Latte V1 — Query-specific token-level compression that retains only tokens relevant to a given question, enabling up to 200x compression; ideal for RAG, search, and Q&A pipelines.
Coldbrew V1 — Query-specific chunk-level filtering that drops entire irrelevant chunks before they reach the model; ideal for structured data like transcripts or logs.
Context Gateway — Open-source proxy for agents that compresses conversation history, tool outputs, and tool lists; compatible with Claude Code, OpenClaw, Codex, and more.
Coarse-grained filtering — Pass a query and a list of chunks to retrieve only the relevant ones, reducing context before fine-grained compression.
Fine-grained compression — Token-level compression given a query and context, stripping irrelevant tokens for maximum efficiency.
Usage-based pricing — Pay per million tokens processed, with no upfront commitment; get started by signing up and calling the API.
Demo environment — Try compression live on sample datasets like SEC filings directly from the dashboard without writing any code.

Community Discussions

Be the first to start a conversation about Compresr

Share your experience with Compresr, ask questions, or help others learn from your insights.

Pricing

Espresso V1

Query-agnostic token-level compression. Pre-compress long documents, system prompts, or agent histories once and reuse across multiple queries.

usage based

Query-agnostic token-level compression
Pre-compress long documents
Compress system prompts
Compress agent histories
Reuse compressed context across multiple queries

Latte V1

Query-specific token-level compression. Retains only tokens relevant to a given question, enabling aggressive (up to 200x) compression. Ideal for RAG, search, and Q&A pipelines.

$0.25

usage based

Query-specific token-level compression
Up to 200x compression ratio
Ideal for RAG pipelines
Ideal for search and Q&A workflows

Coldbrew V1

Query-specific chunk-level filtering. Drops entire irrelevant chunks before they reach the model — ideal for structured data like transcripts or logs.

$0.25

usage based

Query-specific chunk-level filtering
Drops irrelevant chunks before model inference
Ideal for transcripts and logs
No retrieval index required

View official pricing

Capabilities

Key Features

Token-level context compression
Chunk-level context filtering
Query-agnostic pre-compression
Query-specific compression
Up to 200x compression ratio
Context Gateway for agents
Conversation history compression
Tool output compression
RAG pipeline optimization
Open-source proxy gateway
REST API access
Dashboard demo environment

Integrations

Claude Code

OpenClaw

Codex

GPT models

RAG pipelines

API Available

View Docs

Back to all tools