TensorZero

Name: TensorZero
Availability: OnlineOnly
Author: TensorZero

An open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation in a single self-hosted stack.

Visit Website

At a Glance

Pricing

Open Source

100% self-hosted and open-source LLMOps platform under Apache 2.0 license.

Engagement

Available On

API

CLI

SDK

TensorZeroNew York, NYEst. 2024$7.3M raised

Listed Jun 2026

About TensorZero

TensorZero is an open-source LLMOps platform built in Rust and licensed under Apache 2.0. It unifies five capabilities — LLM gateway, observability, evaluation, optimization, and experimentation — into a single self-hosted stack that teams can adopt incrementally. The homepage notes that TensorZero is no longer actively maintained, though the repository remains publicly available on GitHub.

What It Is

TensorZero is a self-hosted infrastructure layer that sits between your application and every major LLM provider. Rather than requiring teams to stitch together separate tools for routing, logging, fine-tuning, and A/B testing, TensorZero provides all of these as a unified platform. The gateway is written in Rust and the project claims sub-1ms p99 latency overhead at 10,000+ QPS. It exposes an OpenAI-compatible API, so any existing OpenAI SDK (Python, Node, Go, etc.) can point to it with a single base_url change.

Core Architecture

TensorZero is deployed as a single Docker container (the TensorZero Gateway) backed by a user-owned database. The five pillars of the platform are:

Gateway: A unified API that routes to Anthropic, AWS Bedrock, AWS SageMaker, Azure, DeepSeek, Fireworks, GCP Vertex AI, Google AI Studio, Groq, Hyperbolic, Mistral, OpenAI, OpenRouter, SGLang, TGI, Together AI, vLLM, xAI (Grok), and any OpenAI-compatible endpoint (e.g. Ollama).
Observability: Inferences and feedback (metrics, human edits) are stored in the user's own database. OpenTelemetry (OTLP) and Prometheus export are supported.
Evaluation: Supports inference evaluations (unit-test style) and workflow evaluations (integration-test style) via heuristics or LLM judges, runnable from a UI or CLI.
Optimization: Supervised fine-tuning, RLHF, automated prompt engineering (GEPA algorithm), and dynamic in-context learning (DICL) turn production data into a learning flywheel.
Experimentation: Built-in adaptive A/B testing, routing, fallbacks, retries, and load balancing.

TensorZero Autopilot

The README describes TensorZero Autopilot as an "automated AI engineer" add-on powered by TensorZero. According to the project, Autopilot analyzes LLM observability data, sets up evaluations, optimizes prompts and models, and runs A/B tests automatically. The README states it "dramatically improves the performance of LLM agents across diverse tasks." Autopilot is described as a complementary paid product, while the core TensorZero platform is free and self-hosted.

Team and Backing

According to the README, the TensorZero team includes a former Rust compiler maintainer, machine learning researchers from Stanford, CMU, Oxford, and Columbia, and the former chief product officer of a decacorn startup. The project announced a $7.3M seed round and received coverage from VentureBeat. The README states TensorZero "is used by companies ranging from frontier AI startups to the Fortune 10 and fuels ~1% of global LLM API spend today" — this is a vendor-published claim.

Current Status: Archived

The TensorZero website states: "TensorZero remains available on GitHub but is no longer maintained." The GitHub repository is marked as ARCHIVED with a last push date of June 2026. The most recent release was version 2026.6.0, published June 4, 2026. Despite the archival, the full source code, documentation, and examples remain publicly accessible under the Apache 2.0 license.

Community Discussions

Be the first to start a conversation about TensorZero

Share your experience with TensorZero, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

100% self-hosted and open-source LLMOps platform under Apache 2.0 license.

LLM gateway with unified API
Observability and feedback storage
Evaluation (inference and workflow)
Optimization (SFT, RLHF, GEPA, DICL)
Adaptive A/B testing and experimentation

Capabilities

Key Features

Unified LLM gateway with OpenAI-compatible API
Sub-1ms p99 latency overhead at 10k+ QPS (Rust-based)
Support for 18+ LLM providers including Anthropic, OpenAI, AWS Bedrock, GCP Vertex AI, and more
Structured outputs (JSON), tool use, batch inference, embeddings, multimodal (images, files), and caching
Routing, retries, fallbacks, and load balancing for high availability
Self-hosted observability: store inferences and feedback in your own database
OpenTelemetry (OTLP) and Prometheus metrics export
Inference and workflow evaluations via heuristics or LLM judges
Supervised fine-tuning (SFT) and RLHF optimization
Automated prompt engineering with GEPA algorithm
Dynamic in-context learning (DICL)
Adaptive A/B testing and experimentation
TensorZero Autopilot: automated AI engineer add-on
GitOps-friendly configuration
Interactive Playground UI
Dataset building for optimization and evaluation workflows
Custom rate limiting with granular scopes
Auth setup to allow clients to access models without sharing provider API keys

Integrations

OpenAI SDK

OpenTelemetry

Prometheus

Anthropic

AWS Bedrock

AWS SageMaker

Azure

DeepSeek

Fireworks

GCP Vertex AI

Google AI Studio (Gemini API)

Groq

Hyperbolic

Mistral

OpenRouter

SGLang

TGI (Text Generation Inference)

Together AI

vLLM

xAI (Grok)

Ollama (OpenAI-compatible)

Docker

API Available

View Docs

Back to all tools Suggest an edit

About TensorZero

What It Is

Core Architecture

TensorZero is deployed as a single Docker container (the TensorZero Gateway) backed by a user-owned database. The five pillars of the platform are:

Gateway: A unified API that routes to Anthropic, AWS Bedrock, AWS SageMaker, Azure, DeepSeek, Fireworks, GCP Vertex AI, Google AI Studio, Groq, Hyperbolic, Mistral, OpenAI, OpenRouter, SGLang, TGI, Together AI, vLLM, xAI (Grok), and any OpenAI-compatible endpoint (e.g. Ollama).
Observability: Inferences and feedback (metrics, human edits) are stored in the user's own database. OpenTelemetry (OTLP) and Prometheus export are supported.
Evaluation: Supports inference evaluations (unit-test style) and workflow evaluations (integration-test style) via heuristics or LLM judges, runnable from a UI or CLI.
Optimization: Supervised fine-tuning, RLHF, automated prompt engineering (GEPA algorithm), and dynamic in-context learning (DICL) turn production data into a learning flywheel.
Experimentation: Built-in adaptive A/B testing, routing, fallbacks, retries, and load balancing.

TensorZero