Tokenwise

Name: Tokenwise
Price: 19.00 USD
Availability: OnlineOnly
Author: Tokenwise Labs

LLM observability and cost optimization proxy that monitors every AI call, identifies waste, and applies one-click fixes to cut LLM bills by 20–30% without touching quality.

Visit Website

At a Glance

Pricing

Trial available

Try Tokenwise for 7 days with access to Full Indie access for 7 days and No credit card required.

Indie: $19/mo

Pro: $79/mo

Engagement

Available On

Web

API

CLI

Tokenwise LabsFranceEst. 2026

Listed Jun 2026

About Tokenwise

Tokenwise is an LLM observability and cost optimization tool built by a small founding team in France. It works as a drop-in HTTP proxy — one base URL change in your existing SDK — and captures cost, latency, errors, and quality data for every LLM call in real time. The tool targets developers and small teams spending between $50 and $2,000 per month on LLM APIs who want visibility and savings without a framework rewrite.

What It Is

Tokenwise sits between your application and your LLM provider as an edge proxy running on Cloudflare Workers across 300+ points of presence. It adds under 50ms of overhead (median 37ms, p95 under 50ms), logs request metadata asynchronously so it never blocks the upstream response, and applies configurable rules for caching, model switching, fallback chains, A/B splits, and tag-based overrides. Provider keys are forwarded to the upstream provider and dropped from memory — they are never persisted. The tool supports OpenAI, Anthropic, Google Gemini, xAI Grok, Groq, DeepSeek, Mistral, and OpenRouter (which adds 200+ additional models), and works with the Vercel AI SDK, LangChain, plain SDKs, and cURL.

How the Optimization Workflow Works

Tokenwise breaks cost reduction into three stages it calls Monitor, Optimize, and Protect:

Monitor: Every call is logged with cost, tokens, latency, and status, sliced by model, app, or tag. A 14-day forecast is pinned to the dashboard.
Optimize: The tool replays real traffic against cheaper models, identifies cache opportunities, and flags oversized prompts. Each recommendation includes an estimated dollar saving and a one-click apply button. Nothing changes silently — optimizations are opt-in.
Protect: Cost spikes, latency regressions, and quality dips trigger alerts via email, Slack, or Discord. Budget caps can auto-roll back to the last known-good configuration. An LLM-as-judge eval engine scores prompts on the candidate model before any switch goes live, and A/B traffic splits (5–50% of traffic) let teams validate changes on real users.

Architecture and Setup Path

Integration requires changing one line of code: set the baseURL in your existing OpenAI, Anthropic, or Vercel AI SDK client to https://proxy.tokenwisehq.com/{provider}/v1 and add a X-Tokenwise-Key header. No client library to install, no SDK to maintain. The proxy handles provider routing, semantic caching at the edge, retry logic with exponential backoff on 5xx and 429 errors, and pass-through prompt caching for providers that support it (Anthropic, OpenAI). A public REST API (tw_api_* keys) provides read access to requests, metrics, and evals.

Security Model

The about and security pages describe several explicit design choices: provider keys flow through the proxy to the upstream provider and are dropped from memory with no persistence in databases, logs, or backups. Prompts and cached completions are encrypted at rest. Access keys are hashed before reaching the database, with only the short prefix stored in the UI. All hops run over TLS with HSTS preload and strict CSP headers. Payload storage is opt-out per workspace or per tag — cost, latency, and token counts are always kept, but the prompt body can be dropped. Outbound webhooks are validated against an allowlist of trusted HTTPS destinations.

Current Status and Positioning

According to the about page, as of May 2026 Tokenwise has shipped the multi-provider proxy, workspaces with role-based access, alerts, evals, a semantic cache, weekly insights emails, a public REST API, and an Optimize page with rules and A/B traffic splits. The homepage states the tool is routing 1.2 billion tokens per month across 48 teams. The about page positions Tokenwise against Helicone (described as being in maintenance mode), Langfuse (described as requiring significant setup time), and LangSmith (described as LangChain-only), framing Tokenwise as a faster-to-set-up alternative with active weekly releases and one-click apply functionality that competitors lack.

Community Discussions

Be the first to start a conversation about Tokenwise

Share your experience with Tokenwise, ask questions, or help others learn from your insights.

Pricing

TRIAL

Free Trial

Try Tokenwise for 7 days with access to Full Indie access for 7 days and No credit card required.

Full Indie access for 7 days
No credit card required
Proxy keeps forwarding after trial ends while you decide

Indie

For solo makers shipping LLM apps.

$19

per month

200,000 requests / month
10 workspaces
60-day request retention
Dashboard, requests log & What changed
Cost & latency spike alerts (email)
Weekly insights digest
Payload storage & request inspector
Optimization recommendations & semantic cache
Public REST API — 1,000 calls/hour

Pro

Popular

For small teams running LLMs in production.

$79

per month

2,000,000 requests / month
50 workspaces with 4 role tiers
180-day request retention
Everything in Indie
LLM-as-judge eval engine & interactive rescore
A/B traffic splits via proxy rules
Quality regression detector & auto-rollback watchdog
Daily & monthly budget caps
Slack & Discord alerts + user webhooks
Team members & roles
Public REST API — 10,000 calls/hour
Priority support · founder Slack

View official pricing

Capabilities

Key Features

Drop-in HTTP proxy with <50ms overhead
Real-time cost, latency, and error monitoring per LLM call
One-click model swap recommendations with quality validation
Semantic caching at the edge (zero-config)
LLM-as-judge eval engine with interactive rescore
A/B traffic splits via proxy rules
Quality regression detector and auto-rollback watchdog
Daily and monthly budget caps
Cost spike and latency regression alerts (email, Slack, Discord)
Weekly insights digest email
Multi-workspace support with 4 role tiers
Public REST API for workspaces, requests, and evals
Payload storage opt-out per workspace or tag
Prompt and completion encryption at rest
Provider key never persisted
14-day spend forecast on dashboard
Retry logic with exponential backoff on 5xx and 429 errors
Pass-through prompt caching for Anthropic and OpenAI
Rules engine: model switch, cache, fallback chain, A/B split, tag override

Integrations

OpenAI

Anthropic

Google Gemini

xAI Grok

Groq

DeepSeek

Mistral

OpenRouter

Vercel AI SDK

LangChain

Slack

Discord

Cloudflare Workers

API Available

View Docs

Back to all tools Suggest an edit

About Tokenwise

What It Is

How the Optimization Workflow Works

Tokenwise breaks cost reduction into three stages it calls Monitor, Optimize, and Protect:

Monitor: Every call is logged with cost, tokens, latency, and status, sliced by model, app, or tag. A 14-day forecast is pinned to the dashboard.
Optimize: The tool replays real traffic against cheaper models, identifies cache opportunities, and flags oversized prompts. Each recommendation includes an estimated dollar saving and a one-click apply button. Nothing changes silently — optimizations are opt-in.
Protect: Cost spikes, latency regressions, and quality dips trigger alerts via email, Slack, or Discord. Budget caps can auto-roll back to the last known-good configuration. An LLM-as-judge eval engine scores prompts on the candidate model before any switch goes live, and A/B traffic splits (5–50% of traffic) let teams validate changes on real users.

Tokenwise