# HoneyHive

> AI observability and evaluation platform to monitor, evaluate, and govern AI agents and applications across any model, framework, or agent runtime.

HoneyHive provides a comprehensive platform for observing, evaluating, and governing AI agents and applications. It enables teams to instrument end-to-end AI applications—including prompts, retrieval, tool calls, MCP servers, and model outputs—so they can identify and fix issues quickly. The platform supports over 100 LLMs and agent frameworks through OpenTelemetry-native instrumentation.

- **Distributed Tracing** allows teams to see inside any agent, framework, or runtime with full visibility into prompts, retrieval steps, tool calls, and model outputs for rapid debugging.

- **Online Evaluation** runs live evaluations with 25+ pre-built evaluators to detect failures across quality, safety, and more at scale, with support for custom LLM-as-a-judge or code evaluators.

- **Monitoring & Alerts** provides real-time alerts when agents silently fail, with drift detection and custom dashboards to track the metrics that matter most.

- **Experiments** enable teams to validate agents pre-deployment on large test suites, compare versions, and catch regressions in CI/CD before users experience them.

- **Prompt Management** offers a collaborative IDE for managing and versioning prompts, with a playground for experimenting with new prompts and models.

- **Dataset Curation** allows teams to centrally manage test cases with domain experts and curate test suites directly from traces in the UI.

- **Human Review** enables domain experts to grade and correct outputs through annotation queues, supporting a hybrid evaluation approach.

- **Session Replays** let teams replay chat sessions in the Playground for detailed analysis and debugging.

- **CI/CD Integration** runs automated test suites over every commit with GitHub integration for version management across artifacts.

- **Enterprise Security** includes SOC-2 Type II, GDPR, and HIPAA compliance with options for multi-tenant SaaS, dedicated cloud, or self-hosting up to fully air-gapped deployments.

To get started, sign up for a free account and integrate your application using SDKs in Python or TypeScript with native OpenTelemetry support. The platform provides automatic instrumentation for 50+ popular libraries including LangChain, LangGraph, AWS Strands, Google ADK, and OpenAI Agents SDK.

## Features
- Distributed Tracing
- Online Evaluation
- Monitoring & Alerts
- Drift Detection
- Custom Dashboards
- Experiments
- Regression Tracking
- CI/CD Integration
- Prompt Management
- Prompt Versioning
- Playground
- Dataset Curation
- Annotation Queues
- Human Review
- Session Replays
- Graph and Timeline View
- Data Export
- Custom Evaluators
- 25+ Pre-built Evaluators
- OpenTelemetry-native
- RBAC
- SSO
- SAML
- Self-hosting
- PII Scrubbing

## Integrations
OpenTelemetry, LangChain, LangGraph, AWS Strands, Google ADK, OpenAI Agents SDK, GitHub, Slack, Microsoft Teams, AWS, Azure, Google Cloud

## Platforms
WEB, API, DEVELOPER_SDK

## Pricing
Freemium — Free tier available with paid upgrades

## Links
- Website: https://www.honeyhive.ai
- Documentation: https://docs.honeyhive.ai/introduction
- EveryDev.ai: https://www.everydev.ai/tools/honeyhive