TensorZero
An open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation in a single self-hosted stack.
At a Glance
About TensorZero
TensorZero is an open-source LLMOps platform built in Rust and licensed under Apache 2.0. It unifies five capabilities — LLM gateway, observability, evaluation, optimization, and experimentation — into a single self-hosted stack that teams can adopt incrementally. The homepage notes that TensorZero is no longer actively maintained, though the repository remains publicly available on GitHub.
What It Is
TensorZero is a self-hosted infrastructure layer that sits between your application and every major LLM provider. Rather than requiring teams to stitch together separate tools for routing, logging, fine-tuning, and A/B testing, TensorZero provides all of these as a unified platform. The gateway is written in Rust and the project claims sub-1ms p99 latency overhead at 10,000+ QPS. It exposes an OpenAI-compatible API, so any existing OpenAI SDK (Python, Node, Go, etc.) can point to it with a single base_url change.
Core Architecture
TensorZero is deployed as a single Docker container (the TensorZero Gateway) backed by a user-owned database. The five pillars of the platform are:
- Gateway: A unified API that routes to Anthropic, AWS Bedrock, AWS SageMaker, Azure, DeepSeek, Fireworks, GCP Vertex AI, Google AI Studio, Groq, Hyperbolic, Mistral, OpenAI, OpenRouter, SGLang, TGI, Together AI, vLLM, xAI (Grok), and any OpenAI-compatible endpoint (e.g. Ollama).
- Observability: Inferences and feedback (metrics, human edits) are stored in the user's own database. OpenTelemetry (OTLP) and Prometheus export are supported.
- Evaluation: Supports inference evaluations (unit-test style) and workflow evaluations (integration-test style) via heuristics or LLM judges, runnable from a UI or CLI.
- Optimization: Supervised fine-tuning, RLHF, automated prompt engineering (GEPA algorithm), and dynamic in-context learning (DICL) turn production data into a learning flywheel.
- Experimentation: Built-in adaptive A/B testing, routing, fallbacks, retries, and load balancing.
TensorZero Autopilot
The README describes TensorZero Autopilot as an "automated AI engineer" add-on powered by TensorZero. According to the project, Autopilot analyzes LLM observability data, sets up evaluations, optimizes prompts and models, and runs A/B tests automatically. The README states it "dramatically improves the performance of LLM agents across diverse tasks." Autopilot is described as a complementary paid product, while the core TensorZero platform is free and self-hosted.
Team and Backing
According to the README, the TensorZero team includes a former Rust compiler maintainer, machine learning researchers from Stanford, CMU, Oxford, and Columbia, and the former chief product officer of a decacorn startup. The project announced a $7.3M seed round and received coverage from VentureBeat. The README states TensorZero "is used by companies ranging from frontier AI startups to the Fortune 10 and fuels ~1% of global LLM API spend today" — this is a vendor-published claim.
Current Status: Archived
The TensorZero website states: "TensorZero remains available on GitHub but is no longer maintained." The GitHub repository is marked as ARCHIVED with a last push date of June 2026. The most recent release was version 2026.6.0, published June 4, 2026. Despite the archival, the full source code, documentation, and examples remain publicly accessible under the Apache 2.0 license.
Community Discussions
Be the first to start a conversation about TensorZero
Share your experience with TensorZero, ask questions, or help others learn from your insights.
Pricing
Open Source
100% self-hosted and open-source LLMOps platform under Apache 2.0 license.
- LLM gateway with unified API
- Observability and feedback storage
- Evaluation (inference and workflow)
- Optimization (SFT, RLHF, GEPA, DICL)
- Adaptive A/B testing and experimentation
Capabilities
Key Features
- Unified LLM gateway with OpenAI-compatible API
- Sub-1ms p99 latency overhead at 10k+ QPS (Rust-based)
- Support for 18+ LLM providers including Anthropic, OpenAI, AWS Bedrock, GCP Vertex AI, and more
- Structured outputs (JSON), tool use, batch inference, embeddings, multimodal (images, files), and caching
- Routing, retries, fallbacks, and load balancing for high availability
- Self-hosted observability: store inferences and feedback in your own database
- OpenTelemetry (OTLP) and Prometheus metrics export
- Inference and workflow evaluations via heuristics or LLM judges
- Supervised fine-tuning (SFT) and RLHF optimization
- Automated prompt engineering with GEPA algorithm
- Dynamic in-context learning (DICL)
- Adaptive A/B testing and experimentation
- TensorZero Autopilot: automated AI engineer add-on
- GitOps-friendly configuration
- Interactive Playground UI
- Dataset building for optimization and evaluation workflows
- Custom rate limiting with granular scopes
- Auth setup to allow clients to access models without sharing provider API keys
