# LangWatch

> LangWatch is a developer-first platform for testing, evaluating, and monitoring AI agents and LLM applications, with agent simulations, real-time evals, and LLM observability.

LangWatch is an AI agent engineering platform built by a team co-founded by Manouk Draisma and Rogerio Chaves, the latter bringing experience from Booking.com. It provides a unified environment for prototyping, evaluating, deploying, and monitoring LLM-based applications and multi-step agentic systems. The platform is fully open-source, OpenTelemetry-native, and supports both cloud-hosted and self-managed deployments. LangWatch is built in Amsterdam, the Netherlands, and holds ISO 27001 and SOC2 certifications.

## What It Is

LangWatch is an LLMOps platform that sits at the intersection of observability, evaluation, and agent testing. It targets AI engineering teams who need structured, repeatable ways to validate that prompts, models, and agent pipelines behave correctly before and after shipping to production. The platform covers the full development lifecycle: building and versioning prompts, running batch and real-time evaluations, simulating multi-turn agent conversations, and monitoring production traces for regressions or quality degradation.

## Core Capabilities

- **LLM Observability**: Search and inspect any LLM interaction across environments, debug failures, and support audits with full trace visibility from development through production.
- **Agent Simulations**: Run thousands of synthetic conversations across scenarios, languages, and edge cases to stress-test multi-step agentic systems before release.
- **Real-time Evaluations**: Create and tune custom evals that measure quality specific to a product in real time, including LLM-as-judge, code evals, and session evals.
- **Prompt & Model Management**: Version, compare, and deploy prompt and model changes with full traceability and feature-flag–style rollout controls.
- **Auto-prompt Optimization**: Systematically improve prompts and pipelines using DSPy-based structured experimentation.
- **Dataset Management**: Convert production traces into reusable test cases, golden datasets, and benchmarks for experiments, regressions, and fine-tuning.
- **Guardrails**: Built-in safeguards for jailbreaking/prompt injection, PII detection and auto-redaction, competitor blocklists, content moderation, and custom guardrail rules.

## Integration and Deployment Model

LangWatch is OpenTelemetry-native, meaning it integrates with any LLM or agent framework without requiring proprietary instrumentation. Official integrations include Python and TypeScript/JavaScript SDKs, LangChain, LangGraph, DSPy, CrewAI, Agno, Pydantic AI, LiteLLM, AWS Bedrock, OpenAI Agents, Mastra, Langflow, and n8n. The platform can be accessed as a cloud service or self-hosted on-premises, in a VPC, air-gapped, or in a hybrid configuration. The homepage states the project has over 5,600 GitHub stars and processes over 900,000 daily evaluations.

## Collaboration Across Roles

LangWatch is designed to bridge engineering and non-technical stakeholders. Engineers can run prompts, flows, and evaluations programmatically via SDK; product managers and domain experts can define quality scenarios and review results through the UI without writing code. Collaborative workflows support data review, annotation, and pattern analysis across engineering, product, and business teams. The platform includes user analytics, topic detection, sentiment analysis, and custom dashboards for tracking functional KPIs.

## Enterprise and Security Controls

For regulated or high-volume environments, LangWatch offers alternative hosting options including on-premises and hybrid deployments, custom data retention, enterprise SSO (Okta, AzureAD/EntraID), SSO enforcement, RBAC at organization/project/team levels, audit logs, and support SLAs. The platform is ISO 27001 and SOC2 certified and GDPR-compliant. Data region options include EU, US, CA, and APAC for enterprise customers. Billing via AWS and Azure Marketplace is available for enterprise contracts.

## Features
- LLM Observability and trace inspection
- Agent simulations for multi-step agentic systems
- Real-time and offline evaluations
- Prompt versioning and management
- Auto-prompt optimization with DSPy
- Dataset management and golden set creation
- LLM-as-judge evaluations
- PII detection and auto-redaction
- Jailbreak and prompt injection detection
- Content moderation and custom guardrails
- Cost and token tracking
- Multi-agent graph visualization
- Batch tests and experiments
- CI/CD integration for evaluations
- User analytics, topic detection, sentiment analysis
- Custom dashboards and KPI tracking
- Role-based access control
- Audit logs
- OpenTelemetry-native integration
- Self-hosted and on-premises deployment

## Integrations
Python SDK, TypeScript/JavaScript SDK, OpenTelemetry, LangChain, LangGraph, DSPy, CrewAI, Agno, Pydantic AI, LiteLLM, AWS Bedrock, OpenAI Agents, Mastra, Langflow, n8n, Google SSO, AzureAD/EntraID SSO, Okta SSO, GitHub SSO, AWS Marketplace, Azure Marketplace

## Platforms
WEB, API, CLI, DEVELOPER_SDK

## Pricing
Freemium — Free tier available with paid upgrades

## Links
- Website: https://langwatch.ai
- Documentation: https://docs.langwatch.ai/introduction
- Repository: https://github.com/langwatch/langwatch
- EveryDev.ai: https://www.everydev.ai/tools/langwatch