Patronus AI icon

Patronus AI

Patronus AI provides an end-to-end evaluation and monitoring platform for generative AI systems, designed to detect hallucinations, agent failures, safety issues, and other production errors in LLMs and RAG systems. The platform exposes evaluation models (including Lynx), an API and SDKs, experiments for A/B testing, logging and trace analysis, and curated datasets and benchmarks to measure and improve model performance. Teams can run evaluations locally or in production, visualize comparisons, and automate remediation workflows.

  • Percival — An intelligent AI agent debugger that automatically detects 20+ failure modes in agentic traces (agent planning mistakes, incorrect tool use, context misunderstanding) and suggests optimizations with a single click. Percival learns from your annotations to provide domain-specific evaluation. Integrates with LangGraph, Hugging Face smolagents, Pydantic AI, CrewAI, and custom clients.
  • Evaluation API — Use the Patronus API to run automatic evaluators (hallucination, relevance, safety) against model outputs; start by creating an API key and calling the /v1/evaluate endpoint.
  • Patronus Evaluators (Lynx and others) — Access prebuilt, research-backed evaluators for common failure modes or define custom evaluators via the SDK to score specific criteria.
  • Experiments & Comparisons — Run experiments to A/B test prompts, models, and pipeline configurations and compare results side-by-side to guide deployments.
  • Logs & Traces — Capture evaluation runs and traces in production to surface failures, cluster errors, and generate natural-language explanations for issues.
  • Datasets & Benchmarks — Leverage curated datasets (e.g., FinanceBench, SimpleSafetyTests) to stress-test models and measure performance over time.
  • SDKs & Integrations — Use official Python and TypeScript SDKs to integrate evaluation runs into CI, monitoring, and development workflows; the API is framework-agnostic.
  • Deployment options — Cloud-hosted and on-premises options are available for enterprise security, SSO, and custom data retention.

To get started, sign up on the web app, obtain an API key, and follow the quickstart in the SDK documentation to log your first eval or run an experiment. Use the provided SDK examples to call evaluators, configure experiments, and stream traces from production.

No discussions yet

Be the first to start a discussion about Patronus AI

Developer

Patronus AI builds an automated evaluation and monitoring platform for generative AI systems, focusing on LLMs and agents. The team pub…read more

Pricing and Plans

(Freemium)

Developer API (usage)

$10/usage

Pay-as-you-go API pricing for evaluator calls and explanations; billed by usage.

  • $10 / 1k small evaluator API calls
  • $20 / 1k large evaluator API calls
  • $10 / 1k evaluation explanations and $10 in free credits to start

Enterprise

Contact for pricing

Contact sales for enterprise pricing and custom security and deployment options.

  • Unlimited platform features and priority support
  • On-prem / dedicated VPC, custom data retention, SSO
  • Premium API features and higher rate limits

System Requirements

Operating System
Any OS with a modern browser
Memory (RAM)
4 GB+ RAM
Processor
Any modern 64-bit CPU
Disk Space
None (web app)

AI Capabilities

Hallucination detection
Evaluation models
Real-time monitoring
Agent failure detection
Dataset generation
Explanation generation