LangWatch

Name: LangWatch
Availability: OnlineOnly
Author: LangWatch

LangWatch is a developer-first platform for testing, evaluating, and monitoring AI agents and LLM applications, with agent simulations, real-time evals, and LLM observability.

Visit Website

At a Glance

Pricing

Free tier available

Get started with AI agent monitoring, evaluation & agent simulations. No credit card required.

Growth: Custom/contact

Enterprise / Regulated: Custom/contact

Engagement

Available On

Web

API

CLI

SDK

LangWatchAmsterdam, NetherlandsEst. 2023

Listed May 2026

About LangWatch

LangWatch is an AI agent engineering platform built by a team co-founded by Manouk Draisma and Rogerio Chaves, the latter bringing experience from Booking.com. It provides a unified environment for prototyping, evaluating, deploying, and monitoring LLM-based applications and multi-step agentic systems. The platform is fully open-source, OpenTelemetry-native, and supports both cloud-hosted and self-managed deployments. LangWatch is built in Amsterdam, the Netherlands, and holds ISO 27001 and SOC2 certifications.

What It Is

LangWatch is an LLMOps platform that sits at the intersection of observability, evaluation, and agent testing. It targets AI engineering teams who need structured, repeatable ways to validate that prompts, models, and agent pipelines behave correctly before and after shipping to production. The platform covers the full development lifecycle: building and versioning prompts, running batch and real-time evaluations, simulating multi-turn agent conversations, and monitoring production traces for regressions or quality degradation.

Core Capabilities

LLM Observability: Search and inspect any LLM interaction across environments, debug failures, and support audits with full trace visibility from development through production.
Agent Simulations: Run thousands of synthetic conversations across scenarios, languages, and edge cases to stress-test multi-step agentic systems before release.
Real-time Evaluations: Create and tune custom evals that measure quality specific to a product in real time, including LLM-as-judge, code evals, and session evals.
Prompt & Model Management: Version, compare, and deploy prompt and model changes with full traceability and feature-flag–style rollout controls.
Auto-prompt Optimization: Systematically improve prompts and pipelines using DSPy-based structured experimentation.
Dataset Management: Convert production traces into reusable test cases, golden datasets, and benchmarks for experiments, regressions, and fine-tuning.
Guardrails: Built-in safeguards for jailbreaking/prompt injection, PII detection and auto-redaction, competitor blocklists, content moderation, and custom guardrail rules.

Integration and Deployment Model

LangWatch is OpenTelemetry-native, meaning it integrates with any LLM or agent framework without requiring proprietary instrumentation. Official integrations include Python and TypeScript/JavaScript SDKs, LangChain, LangGraph, DSPy, CrewAI, Agno, Pydantic AI, LiteLLM, AWS Bedrock, OpenAI Agents, Mastra, Langflow, and n8n. The platform can be accessed as a cloud service or self-hosted on-premises, in a VPC, air-gapped, or in a hybrid configuration. The homepage states the project has over 5,600 GitHub stars and processes over 900,000 daily evaluations.

Collaboration Across Roles

LangWatch is designed to bridge engineering and non-technical stakeholders. Engineers can run prompts, flows, and evaluations programmatically via SDK; product managers and domain experts can define quality scenarios and review results through the UI without writing code. Collaborative workflows support data review, annotation, and pattern analysis across engineering, product, and business teams. The platform includes user analytics, topic detection, sentiment analysis, and custom dashboards for tracking functional KPIs.

Enterprise and Security Controls

For regulated or high-volume environments, LangWatch offers alternative hosting options including on-premises and hybrid deployments, custom data retention, enterprise SSO (Okta, AzureAD/EntraID), SSO enforcement, RBAC at organization/project/team levels, audit logs, and support SLAs. The platform is ISO 27001 and SOC2 certified and GDPR-compliant. Data region options include EU, US, CA, and APAC for enterprise customers. Billing via AWS and Azure Marketplace is available for enterprise contracts.

Community Discussions

Be the first to start a conversation about LangWatch

Share your experience with LangWatch, ask questions, or help others learn from your insights.

Pricing

FREE

Developer

Get started with AI agent monitoring, evaluation & agent simulations. No credit card required.

All platform features
50,000 events per month
14 days data access
2 users
3 Scenarios, 3 Simulations & 3 custom evaluations

Growth

Evals, prompts, and agents in one place. CI/CD for engineers, collaboration for PMs.

Custom

contact sales

All platform features
Everything in Developer
200,000 events included
Additional events at €0.0005 per event
30 days data retention included
Custom retention available
Up to 20 users (volume discount above 20)
Unlimited lite-users
Unlimited eval scores, simulations & prompts
Private Slack/Teams support

Enterprise / Regulated

Premium support with on-prem or hosted deployment for high volume or privacy-sensitive data.

Custom

contact sales

Alternative hosting options: hybrid, self-hosted, on-prem
Custom data retention
Custom SSO / RBAC
Audit logs
Uptime & Support SLA
ISO27001 reports, InfoSec/legal reviews
Custom Terms, DPA
Forward Deployed Engineer
Billing via AWS, Google, Azure Marketplace

View official pricing

Capabilities

Key Features

LLM Observability and trace inspection
Agent simulations for multi-step agentic systems
Real-time and offline evaluations
Prompt versioning and management
Auto-prompt optimization with DSPy
Dataset management and golden set creation
LLM-as-judge evaluations
PII detection and auto-redaction
Jailbreak and prompt injection detection
Content moderation and custom guardrails
Cost and token tracking
Multi-agent graph visualization
Batch tests and experiments
CI/CD integration for evaluations
User analytics, topic detection, sentiment analysis
Custom dashboards and KPI tracking
Role-based access control
Audit logs
OpenTelemetry-native integration
Self-hosted and on-premises deployment

Integrations

Python SDK

TypeScript/JavaScript SDK

OpenTelemetry

LangChain

LangGraph

DSPy

CrewAI

Agno

Pydantic AI

LiteLLM

AWS Bedrock

OpenAI Agents

Mastra

Langflow

n8n

Google SSO

AzureAD/EntraID SSO

Okta SSO

GitHub SSO

AWS Marketplace

Azure Marketplace

API Available

View Docs

Back to all tools Suggest an edit

About LangWatch

What It Is

Core Capabilities

LLM Observability: Search and inspect any LLM interaction across environments, debug failures, and support audits with full trace visibility from development through production.
Agent Simulations: Run thousands of synthetic conversations across scenarios, languages, and edge cases to stress-test multi-step agentic systems before release.
Real-time Evaluations: Create and tune custom evals that measure quality specific to a product in real time, including LLM-as-judge, code evals, and session evals.
Prompt & Model Management: Version, compare, and deploy prompt and model changes with full traceability and feature-flag–style rollout controls.
Auto-prompt Optimization: Systematically improve prompts and pipelines using DSPy-based structured experimentation.
Dataset Management: Convert production traces into reusable test cases, golden datasets, and benchmarks for experiments, regressions, and fine-tuning.
Guardrails: Built-in safeguards for jailbreaking/prompt injection, PII detection and auto-redaction, competitor blocklists, content moderation, and custom guardrail rules.

LangWatch