# The Token Company > A prompt compression API that removes context bloat from LLM inputs, reducing token costs and improving accuracy with a simple drop-in middleware integration. The Token Company provides a prompt compression API that removes semantic redundancy and context bloat from LLM inputs before they reach your model. Using their bear-1.x model family, developers can reduce token counts by up to 75%, cutting LLM costs dramatically while simultaneously improving accuracy and reducing latency. The API integrates in minutes as drop-in middleware with a single POST call, and benchmarks show measurable improvements on real-world financial documents and reading comprehension tasks. - **bear-1.x Compression Models**: *Use `bear-1`, `bear-1.1`, or `bear-1.2` (recommended) to semantically compress prompts while preserving intent and logical relationships.* - **Usage-Based Pricing**: *Pay only $0.05 per 1M tokens removed — you are never charged for tokens that remain in the output.* - **One-Call Integration**: *Send text to `POST api.thetokencompany.com/v1/compress` with your API key and receive compressed text back; drop it in before any LLM call.* - **Adjustable Aggressiveness**: *Control compression intensity with a `aggressiveness` parameter from 0.0 to 1.0 to balance compression ratio vs. fidelity.* - **Protected Tokens**: *Wrap sensitive or critical text in `` tags to prevent those sections from being compressed.* - **Gzip Support**: *Enable gzip encoding on requests for up to 2.5x faster large-payload transfers; enabled by default in the Python SDK and npm package.* - **Python SDK & npm Package**: *Get started quickly with official SDKs that handle authentication, gzip, and response parsing out of the box.* - **Proven Benchmarks**: *Compression improved SEC filing QA accuracy by 2.7pp with 20% fewer tokens, SQuAD 2.0 accuracy by 4.0pp with 17% fewer tokens, and reduced E2E latency by up to 37% on Claude Opus.* - **Chat & Document Use Cases**: *Expand conversation history 3x within the same context window, or process large PDFs and web scrapes without bloated inputs.* ## Features - Prompt compression via bear-1, bear-1.1, bear-1.2 models - Usage-based pricing at $0.05 per 1M compressed tokens - Single POST API endpoint for drop-in middleware integration - Adjustable compression aggressiveness (0.0–1.0) - Protected tokens via tags - Gzip compression support for faster large payloads - Python SDK and npm package - Token count reporting (input vs. output) - Real-world benchmarks on financial and reading comprehension tasks - Infinite chat history demo ## Integrations OpenAI GPT, Anthropic Claude, Google Gemini, OpenRouter, Any LLM API ## Platforms WEB, API, DEVELOPER_SDK ## Pricing Open Source, Free tier available ## Version bear-1.2 ## Links - Website: https://thetokencompany.com - Documentation: https://thetokencompany.com/docs - Repository: https://github.com/TheTokenCompany - EveryDev.ai: https://www.everydev.ai/tools/the-token-company