# Patronus AI

> Automated evaluation and monitoring platform that scores, detects failures, and optimizes LLMs and AI agents using evaluation models, experiments, traces, and an API/SDK ecosystem.

Patronus AI provides an end-to-end evaluation and monitoring platform for generative AI systems, designed to detect hallucinations, agent failures, safety issues, and other production errors in LLMs and RAG systems. The platform exposes evaluation models (including Lynx), an API and SDKs, experiments for A/B testing, logging and trace analysis, and curated datasets and benchmarks to measure and improve model performance. Teams can run evaluations locally or in production, visualize comparisons, and automate remediation workflows.

- **Percival** — An intelligent AI agent debugger that automatically detects 20+ failure modes in agentic traces (agent planning mistakes, incorrect tool use, context misunderstanding) and suggests optimizations with a single click. Percival learns from your annotations to provide domain-specific evaluation. Integrates with LangGraph, Hugging Face smolagents, Pydantic AI, CrewAI, and custom clients.
- **Evaluation API** — Use the Patronus API to run automatic evaluators (hallucination, relevance, safety) against model outputs; start by creating an API key and calling the /v1/evaluate endpoint.
- **Patronus Evaluators (Lynx and others)** — Access prebuilt, research-backed evaluators for common failure modes or define custom evaluators via the SDK to score specific criteria.
- **Experiments & Comparisons** — Run experiments to A/B test prompts, models, and pipeline configurations and compare results side-by-side to guide deployments.
- **Logs & Traces** — Capture evaluation runs and traces in production to surface failures, cluster errors, and generate natural-language explanations for issues.
- **Datasets & Benchmarks** — Leverage curated datasets (e.g., FinanceBench, SimpleSafetyTests) to stress-test models and measure performance over time.
- **SDKs & Integrations** — Use official Python and TypeScript SDKs to integrate evaluation runs into CI, monitoring, and development workflows; the API is framework-agnostic.
- **Deployment options** — Cloud-hosted and on-premises options are available for enterprise security, SSO, and custom data retention.

To get started, sign up on the web app, obtain an API key, and follow the quickstart in the SDK documentation to log your first eval or run an experiment. Use the provided SDK examples to call evaluators, configure experiments, and stream traces from production.

## Features
- Evaluation API for automated scoring
- Research-backed evaluators (Lynx and others)
- Real-time monitoring and traces
- A/B experiments and comparisons
- Curated datasets and benchmarks (FinanceBench, SimpleSafetyTests)
- Python and TypeScript SDKs
- Cloud and on-prem deployment options
- Evaluation explanations and failure mode detection

## Integrations
AWS, Databricks, MongoDB, OpenAI

## Platforms
WEB, API, DEVELOPER_SDK

## Pricing
Freemium — Free tier available with paid upgrades

## Links
- Website: https://patronus.ai/
- Documentation: https://docs.patronus.ai/docs/api_ref
- Repository: https://github.com/patronus-ai/patronus-api-node
- EveryDev.ai: https://www.everydev.ai/tools/patronus-ai