# Tokenwise

> LLM observability and cost optimization proxy that monitors every AI call, identifies waste, and applies one-click fixes to cut LLM bills by 20–30% without touching quality.

Tokenwise is an LLM observability and cost optimization tool built by a small founding team in France. It works as a drop-in HTTP proxy — one base URL change in your existing SDK — and captures cost, latency, errors, and quality data for every LLM call in real time. The tool targets developers and small teams spending between $50 and $2,000 per month on LLM APIs who want visibility and savings without a framework rewrite.

## What It Is

Tokenwise sits between your application and your LLM provider as an edge proxy running on Cloudflare Workers across 300+ points of presence. It adds under 50ms of overhead (median 37ms, p95 under 50ms), logs request metadata asynchronously so it never blocks the upstream response, and applies configurable rules for caching, model switching, fallback chains, A/B splits, and tag-based overrides. Provider keys are forwarded to the upstream provider and dropped from memory — they are never persisted. The tool supports OpenAI, Anthropic, Google Gemini, xAI Grok, Groq, DeepSeek, Mistral, and OpenRouter (which adds 200+ additional models), and works with the Vercel AI SDK, LangChain, plain SDKs, and cURL.

## How the Optimization Workflow Works

Tokenwise breaks cost reduction into three stages it calls Monitor, Optimize, and Protect:

- **Monitor:** Every call is logged with cost, tokens, latency, and status, sliced by model, app, or tag. A 14-day forecast is pinned to the dashboard.
- **Optimize:** The tool replays real traffic against cheaper models, identifies cache opportunities, and flags oversized prompts. Each recommendation includes an estimated dollar saving and a one-click apply button. Nothing changes silently — optimizations are opt-in.
- **Protect:** Cost spikes, latency regressions, and quality dips trigger alerts via email, Slack, or Discord. Budget caps can auto-roll back to the last known-good configuration. An LLM-as-judge eval engine scores prompts on the candidate model before any switch goes live, and A/B traffic splits (5–50% of traffic) let teams validate changes on real users.

## Architecture and Setup Path

Integration requires changing one line of code: set the `baseURL` in your existing OpenAI, Anthropic, or Vercel AI SDK client to `https://proxy.tokenwisehq.com/{provider}/v1` and add a `X-Tokenwise-Key` header. No client library to install, no SDK to maintain. The proxy handles provider routing, semantic caching at the edge, retry logic with exponential backoff on 5xx and 429 errors, and pass-through prompt caching for providers that support it (Anthropic, OpenAI). A public REST API (`tw_api_*` keys) provides read access to requests, metrics, and evals.

## Security Model

The about and security pages describe several explicit design choices: provider keys flow through the proxy to the upstream provider and are dropped from memory with no persistence in databases, logs, or backups. Prompts and cached completions are encrypted at rest. Access keys are hashed before reaching the database, with only the short prefix stored in the UI. All hops run over TLS with HSTS preload and strict CSP headers. Payload storage is opt-out per workspace or per tag — cost, latency, and token counts are always kept, but the prompt body can be dropped. Outbound webhooks are validated against an allowlist of trusted HTTPS destinations.

## Current Status and Positioning

According to the about page, as of May 2026 Tokenwise has shipped the multi-provider proxy, workspaces with role-based access, alerts, evals, a semantic cache, weekly insights emails, a public REST API, and an Optimize page with rules and A/B traffic splits. The homepage states the tool is routing 1.2 billion tokens per month across 48 teams. The about page positions Tokenwise against Helicone (described as being in maintenance mode), Langfuse (described as requiring significant setup time), and LangSmith (described as LangChain-only), framing Tokenwise as a faster-to-set-up alternative with active weekly releases and one-click apply functionality that competitors lack.

## Features
- Drop-in HTTP proxy with <50ms overhead
- Real-time cost, latency, and error monitoring per LLM call
- One-click model swap recommendations with quality validation
- Semantic caching at the edge (zero-config)
- LLM-as-judge eval engine with interactive rescore
- A/B traffic splits via proxy rules
- Quality regression detector and auto-rollback watchdog
- Daily and monthly budget caps
- Cost spike and latency regression alerts (email, Slack, Discord)
- Weekly insights digest email
- Multi-workspace support with 4 role tiers
- Public REST API for workspaces, requests, and evals
- Payload storage opt-out per workspace or tag
- Prompt and completion encryption at rest
- Provider key never persisted
- 14-day spend forecast on dashboard
- Retry logic with exponential backoff on 5xx and 429 errors
- Pass-through prompt caching for Anthropic and OpenAI
- Rules engine: model switch, cache, fallback chain, A/B split, tag override

## Integrations
OpenAI, Anthropic, Google Gemini, xAI Grok, Groq, DeepSeek, Mistral, OpenRouter, Vercel AI SDK, LangChain, Slack, Discord, Cloudflare Workers

## Platforms
WEB, API, CLI

## Pricing
Freemium — Free tier available with paid upgrades

## Links
- Website: https://tokenwisehq.com
- Documentation: https://tokenwisehq.com/docs
- EveryDev.ai: https://www.everydev.ai/tools/tokenwise