Ashr

Name: Ashr
Availability: OnlineOnly
Author: Ashr

Ashr is an AI agent evaluation platform that mimics production environments and user behavior to catch agent failures before they reach real users.

Visit Website

At a Glance

Pricing

Free

Schedule a call to get started with Ashr evals for your AI agent.

Engagement

Available On

Windows

macOS

Linux

Web

API

AshrBerkeley, CAEst. 2026$500000 raised

Listed Mar 2026

About Ashr

Ashr is an eval platform built specifically for AI agents, enabling teams to run automated tests, catch regressions, and fix failures before they impact production users. It simulates realistic multi-turn conversations — including audio, image, and file inputs — and compares expected vs. actual tool calls and agent responses with detailed scoring. Backed by Y Combinator, Ashr is used by teams at UC Berkeley, Stanford, and several AI startups to ship agents with confidence.

Scenario-based evals: Define multi-turn test scenarios with realistic user personas, audio profiles, and tool call sequences to stress-test your agent end-to-end.
Tool call matching: Ashr compares expected and actual tool calls side-by-side, flagging exact matches, partial matches, and mismatches with divergence notes.
Dataset browser: Browse every test dataset your agent has run, with full timelines, speaker turns, tool calls, and pass/fail scores in one place.
Prompt version control: Track every prompt version with inline diffs, per-version pass rates, and run history so you always know which edit caused a regression.
Validation scoring: Multiple scoring methods — embeddings, LLM-judge, and exact-match — give a comprehensive view of agent quality across runs.
Python SDK (ashr-labs): Install via pip install ashr-labs, initialize AshrLabsClient, and use RunBuilder to incrementally record test results as your agent executes, then deploy them to the dashboard.
CI/CD integration: Drop the SDK into GitHub Actions workflows to automatically submit eval results on every push or pull request.
Auto-eval generation: Wrap your agent with the SDK in production capture mode and Ashr auto-generates eval datasets from real usage patterns.
Request-based dataset generation: Submit a create_request payload describing your agent, domain, and test config to generate comprehensive, domain-specific test datasets.
API key management: Create, list, and revoke API keys directly from the web dashboard or programmatically via the SDK.

Community Discussions

Be the first to start a conversation about Ashr

Share your experience with Ashr, ask questions, or help others learn from your insights.

Pricing

FREE

Get Started

Schedule a call to get started with Ashr evals for your AI agent.

Scenario-based evals
Tool call matching
Dataset browser
Python SDK access
CI/CD integration

Capabilities

Key Features

Scenario-based multi-turn agent evals
Tool call expected vs. actual comparison
Embeddings, LLM-judge, and exact-match scoring
Dataset browser with full test timelines
Prompt version control with inline diffs
Auto-eval generation from production traffic
CI/CD integration via GitHub Actions
Python SDK (ashr-labs) with RunBuilder
Request-based test dataset generation
API key management
Audio, image, and file input support in scenarios
Run history and pass rate tracking per prompt version

Integrations

GitHub Actions

Python

PyPI

API Available

View Docs

Back to all tools

Ashr

LLM Evaluations

Ashr is an AI agent evaluation platform that mimics production environments and user behavior to catch agent failures before they reach real users.

Visit Website

At a Glance

Pricing

Free

Schedule a call to get started with Ashr evals for your AI agent.

Engagement

13views

Discussions

Available On

Windows

macOS

Linux

Web

API

Resources

Website Docs llms.txt

Topics

LLM Evaluations Agent Frameworks Automated Testing

Alternatives

Plurai Kelet Clawd Arena

Developer

AshrBerkeley, CAEst. 2026$500000 raised

Listed Mar 2026

About Ashr

Scenario-based evals: Define multi-turn test scenarios with realistic user personas, audio profiles, and tool call sequences to stress-test your agent end-to-end.
Tool call matching: Ashr compares expected and actual tool calls side-by-side, flagging exact matches, partial matches, and mismatches with divergence notes.
Dataset browser: Browse every test dataset your agent has run, with full timelines, speaker turns, tool calls, and pass/fail scores in one place.
Prompt version control: Track every prompt version with inline diffs, per-version pass rates, and run history so you always know which edit caused a regression.
Validation scoring: Multiple scoring methods — embeddings, LLM-judge, and exact-match — give a comprehensive view of agent quality across runs.
Python SDK (ashr-labs): Install via pip install ashr-labs, initialize AshrLabsClient, and use RunBuilder to incrementally record test results as your agent executes, then deploy them to the dashboard.
CI/CD integration: Drop the SDK into GitHub Actions workflows to automatically submit eval results on every push or pull request.
Auto-eval generation: Wrap your agent with the SDK in production capture mode and Ashr auto-generates eval datasets from real usage patterns.
Request-based dataset generation: Submit a create_request payload describing your agent, domain, and test config to generate comprehensive, domain-specific test datasets.
API key management: Create, list, and revoke API keys directly from the web dashboard or programmatically via the SDK.

Community Discussions

Be the first to start a conversation about Ashr

Share your experience with Ashr, ask questions, or help others learn from your insights.

Pricing

FREE

Get Started

Schedule a call to get started with Ashr evals for your AI agent.

Scenario-based evals
Tool call matching
Dataset browser
Python SDK access
CI/CD integration

Capabilities

Key Features

Scenario-based multi-turn agent evals
Tool call expected vs. actual comparison
Embeddings, LLM-judge, and exact-match scoring
Dataset browser with full test timelines
Prompt version control with inline diffs
Auto-eval generation from production traffic
CI/CD integration via GitHub Actions
Python SDK (ashr-labs) with RunBuilder
Request-based test dataset generation
API key management
Audio, image, and file input support in scenarios
Run history and pass rate tracking per prompt version

Integrations

GitHub Actions

Python

PyPI

API Available

View Docs

Back to all tools