Weave Router

Name: Weave Router
Availability: OnlineOnly
Author: Workweave

A prompt router that routes each request to the best quality-per-token AI model, working as a drop-in proxy for Claude Code, Codex, and Cursor.

Visit Website

At a Glance

Pricing

Free tier available

Run the full router stack on your own infrastructure using your own provider keys. No managed service fees.

Managed Service: Custom/contact

Engagement

Available On

CLI

API

Web

WorkweaveSan Francisco, CAEst. 2024$24.2M raised

Listed Jun 2026

About Weave Router

Weave Router is a model routing proxy built by Workweave that sits between AI coding tools and upstream LLM providers, automatically selecting the best model for each prompt based on quality-per-token scoring. It works as a transparent drop-in for Claude Code, OpenAI Codex, Cursor, and opencode, requiring no changes to how developers use those tools. The project is source-available on GitHub under the Elastic License 2.0 and can be run as a managed hosted service or self-hosted on your own infrastructure.

What It Is

Weave Router is an AI model routing layer that intercepts prompts from agentic coding tools and routes them to the most appropriate model across Anthropic, OpenAI, Google Gemini, and OpenRouter (for open-source models like DeepSeek, Kimi, Llama, Mistral, and others). Rather than using a "vibes-based prompt" to pick models, the router uses a small on-box ONNX embedder and a cluster scorer derived from the Avengers-Pro research paper (arXiv:2508.12631) to make routing decisions in under 50 milliseconds. The core claim, as stated on the product page, is that routing easy requests to cheaper open-source models at parity quality can roughly double a team's token runway on the same budget.

How the Routing Works

The router embeds each incoming prompt using a lightweight in-process ONNX model, then scores it against frozen cluster centroids to classify the request complexity. Based on that classification, it selects the model with the best quality-per-token ratio from the enabled provider pool. Key architectural properties include:

Cache awareness: Model selection accounts for prompt caching so the router only switches models when the cost/quality tradeoff justifies losing cache hits.
Latency overhead: The routing layer adds only low single-digit milliseconds on top of the upstream call.
Zero-retention by default: Prompts and completions are not stored on Weave infrastructure; only routing metadata (classification label, chosen model, latency, cost) is retained.
BYOK: Provider keys stay on the user's machine, encrypted at rest, for self-hosted deployments.

Supported Providers and Tools

The router speaks multiple API formats natively — Anthropic Messages, OpenAI Chat Completions, and Gemini native — including streaming, tool use, and vision. Supported upstream providers include Anthropic, OpenAI, Google Gemini, and OpenRouter for open-source models. On the client side, it integrates with:

Claude Code via automatic config wiring
OpenAI Codex CLI via ~/.codex/config.toml patching
opencode via opencode.json provider merging
Cursor via the OpenAI Base URL override in settings (noted as early beta)

Setup Path

The fastest path to the hosted managed service is a single command: npx @workweave/router. The installer detects installed clients, asks for scope (user vs. project), generates a router key, and wires the appropriate config files automatically. Node ≥ 18 is required. For self-hosting, the stack runs via make full-setup, which boots Postgres and the router on port 8080 with a local dashboard. The router exposes standard endpoints: POST /v1/messages (Anthropic format), POST /v1/chat/completions (OpenAI format), POST /v1beta/models/:action (Gemini format), and POST /v1/route for inspecting routing decisions without making an upstream call. OTLP traces are emitted out of the box and can be sent to Honeycomb, Datadog, Grafana, or the built-in Weave dashboard.

Current Status

The GitHub repository was created in April 2026 and last pushed in late June 2026, with 601 stars and 28 forks as of that date. The project is written in Go (requiring Go 1.25+) and is actively maintained with 41 open issues. The license is Elastic License 2.0, which permits self-hosting and modification but prohibits offering the software as a managed service to third parties. Workweave describes itself as "The #1 engineering intelligence platform" and lists the router as part of a broader platform that includes code intelligence, agent observability, and engineering analytics products.

Community Discussions

Be the first to start a conversation about Weave Router

Share your experience with Weave Router, ask questions, or help others learn from your insights.

Pricing

FREE

Self-Hosted

Run the full router stack on your own infrastructure using your own provider keys. No managed service fees.

Full router source code on GitHub
BYOK (Bring Your Own Key) for all providers
Anthropic, OpenAI, Gemini, and OpenRouter support
Built-in dashboard
OTLP observability

Managed Service

Hosted Weave Router with a single unified usage-based credit balance. No juggling separate provider invoices.

Custom

contact sales

Single unified usage-based credit balance
No separate Anthropic, OpenAI, OpenRouter, and Gemini invoices
Hosted router endpoint
Per-request model routing
Zero-retention proxy by default
SOC 2 Type II available on request under NDA

View official pricing

Capabilities

Key Features

Per-request model routing using on-box ONNX embedder and cluster scoring
Drop-in proxy for Claude Code, Codex, Cursor, and opencode
Supports Anthropic Messages, OpenAI Chat Completions, and Gemini native APIs
Streaming, tool use, and vision support across all providers
OpenRouter integration for open-source models (DeepSeek, Kimi, Llama, Mistral, etc.)
BYOK (Bring Your Own Key) with provider keys encrypted at rest
Zero-retention proxy by default — only routing metadata stored
Prompt cache-aware model selection
OTLP traces out of the box (Honeycomb, Datadog, Grafana compatible)
Built-in dashboard at /ui/dashboard
Managed hosted service or self-hosted deployment options
Single npx command installer with automatic client config wiring
Per-repo (project) or user-level scope for config
Toggle routing on/off per client without discarding config
SOC 2 Type II available on request (enterprise)

Integrations

Claude Code

OpenAI Codex CLI

Cursor

opencode

Anthropic API

OpenAI API

Google Gemini API

OpenRouter

DeepSeek

Kimi

Llama

Mistral

Qwen

GLM

Honeycomb

Datadog

Grafana

PostgreSQL

API Available

View Docs

Back to all tools Suggest an edit

About Weave Router

What It Is

How the Routing Works

Cache awareness: Model selection accounts for prompt caching so the router only switches models when the cost/quality tradeoff justifies losing cache hits.
Latency overhead: The routing layer adds only low single-digit milliseconds on top of the upstream call.
Zero-retention by default: Prompts and completions are not stored on Weave infrastructure; only routing metadata (classification label, chosen model, latency, cost) is retained.
BYOK: Provider keys stay on the user's machine, encrypted at rest, for self-hosted deployments.

Supported Providers and Tools

Claude Code via automatic config wiring
OpenAI Codex CLI via ~/.codex/config.toml patching
opencode via opencode.json provider merging
Cursor via the OpenAI Base URL override in settings (noted as early beta)

Weave Router