Ashr
Ashr is an AI agent evaluation platform that mimics production environments and user behavior to catch agent failures before they reach real users.
At a Glance
Pricing
Schedule a call to get started with Ashr evals for your AI agent.
Engagement
Available On
Listed Mar 2026
About Ashr
Ashr is an eval platform built specifically for AI agents, enabling teams to run automated tests, catch regressions, and fix failures before they impact production users. It simulates realistic multi-turn conversations — including audio, image, and file inputs — and compares expected vs. actual tool calls and agent responses with detailed scoring. Backed by Y Combinator, Ashr is used by teams at UC Berkeley, Stanford, and several AI startups to ship agents with confidence.
- Scenario-based evals: Define multi-turn test scenarios with realistic user personas, audio profiles, and tool call sequences to stress-test your agent end-to-end.
- Tool call matching: Ashr compares expected and actual tool calls side-by-side, flagging exact matches, partial matches, and mismatches with divergence notes.
- Dataset browser: Browse every test dataset your agent has run, with full timelines, speaker turns, tool calls, and pass/fail scores in one place.
- Prompt version control: Track every prompt version with inline diffs, per-version pass rates, and run history so you always know which edit caused a regression.
- Validation scoring: Multiple scoring methods — embeddings, LLM-judge, and exact-match — give a comprehensive view of agent quality across runs.
- Python SDK (
ashr-labs): Install viapip install ashr-labs, initializeAshrLabsClient, and useRunBuilderto incrementally record test results as your agent executes, then deploy them to the dashboard. - CI/CD integration: Drop the SDK into GitHub Actions workflows to automatically submit eval results on every push or pull request.
- Auto-eval generation: Wrap your agent with the SDK in production capture mode and Ashr auto-generates eval datasets from real usage patterns.
- Request-based dataset generation: Submit a
create_requestpayload describing your agent, domain, and test config to generate comprehensive, domain-specific test datasets. - API key management: Create, list, and revoke API keys directly from the web dashboard or programmatically via the SDK.
Community Discussions
Be the first to start a conversation about Ashr
Share your experience with Ashr, ask questions, or help others learn from your insights.
Pricing
Free Plan Available
Schedule a call to get started with Ashr evals for your AI agent.
- Scenario-based evals
- Tool call matching
- Dataset browser
- Python SDK access
- CI/CD integration
Capabilities
Key Features
- Scenario-based multi-turn agent evals
- Tool call expected vs. actual comparison
- Embeddings, LLM-judge, and exact-match scoring
- Dataset browser with full test timelines
- Prompt version control with inline diffs
- Auto-eval generation from production traffic
- CI/CD integration via GitHub Actions
- Python SDK (ashr-labs) with RunBuilder
- Request-based test dataset generation
- API key management
- Audio, image, and file input support in scenarios
- Run history and pass rate tracking per prompt version
