Patronus AI

LLM Evaluations

Automated evaluation and monitoring platform that scores, detects failures, and optimizes LLMs and AI agents using evaluation models, experiments, traces, and an API/SDK ecosystem.

Visit Website

At a Glance

Pricing

Freemium

Developer API (usage): $10

Enterprise: Custom/contact

Engagement

11views

0saves

0discussions

Available On

Web

API

SDK

Resources

Website Docs GitHub llms.txt

Topics

LLM Evaluations Automated Testing Observability Platforms

About Patronus AI

Patronus AI provides an end-to-end evaluation and monitoring platform for generative AI systems, designed to detect hallucinations, agent failures, safety issues, and other production errors in LLMs and RAG systems. The platform exposes evaluation models (including Lynx), an API and SDKs, experiments for A/B testing, logging and trace analysis, and curated datasets and benchmarks to measure and improve model performance. Teams can run evaluations locally or in production, visualize comparisons, and automate remediation workflows.

Percival — An intelligent AI agent debugger that automatically detects 20+ failure modes in agentic traces (agent planning mistakes, incorrect tool use, context misunderstanding) and suggests optimizations with a single click. Percival learns from your annotations to provide domain-specific evaluation. Integrates with LangGraph, Hugging Face smolagents, Pydantic AI, CrewAI, and custom clients.
Evaluation API — Use the Patronus API to run automatic evaluators (hallucination, relevance, safety) against model outputs; start by creating an API key and calling the /v1/evaluate endpoint.
Patronus Evaluators (Lynx and others) — Access prebuilt, research-backed evaluators for common failure modes or define custom evaluators via the SDK to score specific criteria.
Experiments & Comparisons — Run experiments to A/B test prompts, models, and pipeline configurations and compare results side-by-side to guide deployments.
Logs & Traces — Capture evaluation runs and traces in production to surface failures, cluster errors, and generate natural-language explanations for issues.
Datasets & Benchmarks — Leverage curated datasets (e.g., FinanceBench, SimpleSafetyTests) to stress-test models and measure performance over time.
SDKs & Integrations — Use official Python and TypeScript SDKs to integrate evaluation runs into CI, monitoring, and development workflows; the API is framework-agnostic.
Deployment options — Cloud-hosted and on-premises options are available for enterprise security, SSO, and custom data retention.

To get started, sign up on the web app, obtain an API key, and follow the quickstart in the SDK documentation to log your first eval or run an experiment. Use the provided SDK examples to call evaluators, configure experiments, and stream traces from production.

Community Discussions

Be the first to start a conversation about Patronus AI

Share your experience with Patronus AI, ask questions, or help others learn from your insights.

Pricing

Developer API (usage)

Pay-as-you-go API pricing for evaluator calls and explanations; billed by usage.

$10

usage based

$10 / 1k small evaluator API calls
$20 / 1k large evaluator API calls
$10 / 1k evaluation explanations and $10 in free credits to start

Enterprise

Contact sales for enterprise pricing and custom security and deployment options.

Custom

contact sales

Unlimited platform features and priority support
On-prem / dedicated VPC, custom data retention, SSO
Premium API features and higher rate limits

View official pricing

Capabilities

Key Features

Evaluation API for automated scoring
Research-backed evaluators (Lynx and others)
Real-time monitoring and traces
A/B experiments and comparisons
Curated datasets and benchmarks (FinanceBench, SimpleSafetyTests)
Python and TypeScript SDKs
Cloud and on-prem deployment options
Evaluation explanations and failure mode detection

Integrations

AWS

Databricks

MongoDB

OpenAI

API Available

View Docs

Back to all tools

Patronus AI

At a Glance

Pricing

Engagement

Available On

Resources

Topics

About Patronus AI

Community Discussions

Be the first to start a conversation about Patronus AI

Pricing

Developer API (usage)

Enterprise

Capabilities

Key Features

Integrations

Confident AI

Galileo

Opik