# Weave Router

> A prompt router that routes each request to the best quality-per-token AI model, working as a drop-in proxy for Claude Code, Codex, and Cursor.

Weave Router is a model routing proxy built by Workweave that sits between AI coding tools and upstream LLM providers, automatically selecting the best model for each prompt based on quality-per-token scoring. It works as a transparent drop-in for Claude Code, OpenAI Codex, Cursor, and opencode, requiring no changes to how developers use those tools. The project is source-available on GitHub under the Elastic License 2.0 and can be run as a managed hosted service or self-hosted on your own infrastructure.

## What It Is

Weave Router is an AI model routing layer that intercepts prompts from agentic coding tools and routes them to the most appropriate model across Anthropic, OpenAI, Google Gemini, and OpenRouter (for open-source models like DeepSeek, Kimi, Llama, Mistral, and others). Rather than using a "vibes-based prompt" to pick models, the router uses a small on-box ONNX embedder and a cluster scorer derived from the Avengers-Pro research paper (arXiv:2508.12631) to make routing decisions in under 50 milliseconds. The core claim, as stated on the product page, is that routing easy requests to cheaper open-source models at parity quality can roughly double a team's token runway on the same budget.

## How the Routing Works

The router embeds each incoming prompt using a lightweight in-process ONNX model, then scores it against frozen cluster centroids to classify the request complexity. Based on that classification, it selects the model with the best quality-per-token ratio from the enabled provider pool. Key architectural properties include:

- **Cache awareness**: Model selection accounts for prompt caching so the router only switches models when the cost/quality tradeoff justifies losing cache hits.
- **Latency overhead**: The routing layer adds only low single-digit milliseconds on top of the upstream call.
- **Zero-retention by default**: Prompts and completions are not stored on Weave infrastructure; only routing metadata (classification label, chosen model, latency, cost) is retained.
- **BYOK**: Provider keys stay on the user's machine, encrypted at rest, for self-hosted deployments.

## Supported Providers and Tools

The router speaks multiple API formats natively — Anthropic Messages, OpenAI Chat Completions, and Gemini native — including streaming, tool use, and vision. Supported upstream providers include Anthropic, OpenAI, Google Gemini, and OpenRouter for open-source models. On the client side, it integrates with:

- **Claude Code** via automatic config wiring
- **OpenAI Codex CLI** via `~/.codex/config.toml` patching
- **opencode** via `opencode.json` provider merging
- **Cursor** via the OpenAI Base URL override in settings (noted as early beta)

## Setup Path

The fastest path to the hosted managed service is a single command: `npx @workweave/router`. The installer detects installed clients, asks for scope (user vs. project), generates a router key, and wires the appropriate config files automatically. Node ≥ 18 is required. For self-hosting, the stack runs via `make full-setup`, which boots Postgres and the router on port 8080 with a local dashboard. The router exposes standard endpoints: `POST /v1/messages` (Anthropic format), `POST /v1/chat/completions` (OpenAI format), `POST /v1beta/models/:action` (Gemini format), and `POST /v1/route` for inspecting routing decisions without making an upstream call. OTLP traces are emitted out of the box and can be sent to Honeycomb, Datadog, Grafana, or the built-in Weave dashboard.

## Current Status

The GitHub repository was created in April 2026 and last pushed in late June 2026, with 601 stars and 28 forks as of that date. The project is written in Go (requiring Go 1.25+) and is actively maintained with 41 open issues. The license is Elastic License 2.0, which permits self-hosting and modification but prohibits offering the software as a managed service to third parties. Workweave describes itself as "The #1 engineering intelligence platform" and lists the router as part of a broader platform that includes code intelligence, agent observability, and engineering analytics products.

## Features
- Per-request model routing using on-box ONNX embedder and cluster scoring
- Drop-in proxy for Claude Code, Codex, Cursor, and opencode
- Supports Anthropic Messages, OpenAI Chat Completions, and Gemini native APIs
- Streaming, tool use, and vision support across all providers
- OpenRouter integration for open-source models (DeepSeek, Kimi, Llama, Mistral, etc.)
- BYOK (Bring Your Own Key) with provider keys encrypted at rest
- Zero-retention proxy by default — only routing metadata stored
- Prompt cache-aware model selection
- OTLP traces out of the box (Honeycomb, Datadog, Grafana compatible)
- Built-in dashboard at /ui/dashboard
- Managed hosted service or self-hosted deployment options
- Single npx command installer with automatic client config wiring
- Per-repo (project) or user-level scope for config
- Toggle routing on/off per client without discarding config
- SOC 2 Type II available on request (enterprise)

## Integrations
Claude Code, OpenAI Codex CLI, Cursor, opencode, Anthropic API, OpenAI API, Google Gemini API, OpenRouter, DeepSeek, Kimi, Llama, Mistral, Qwen, GLM, Honeycomb, Datadog, Grafana, PostgreSQL

## Platforms
CLI, API, WEB

## Pricing
Freemium — Free tier available with paid upgrades

## Version
0.1.0

## Links
- Website: https://weaverouter.com
- Documentation: https://github.com/workweave/router/blob/main/docs/CONFIGURATION.md
- Repository: https://github.com/workweave/router
- EveryDev.ai: https://www.everydev.ai/tools/weave-router
