Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • News
  • Blogs
  • Builds
  • Contests
  • Compare
  • Arena
Create
    EveryDev.ai
    Sign inSubscribe
    Home
    Tools

    1,977+ AI tools

    • New
    • Trending
    • Featured
    • Compare
    • Arena
    Categories
    • Agents1038
    • Coding971
    • Infrastructure415
    • Marketing398
    • Design335
    • Projects313
    • Analytics299
    • Research290
    • Testing183
    • Integration167
    • Data163
    • Security156
    • MCP145
    • Learning135
    • Communication120
    • Extensions114
    • Prompts110
    • Commerce106
    • Voice102
    • DevOps84
    • Web71
    • Finance18
    1. Home
    2. Tools
    3. Ashr
    Ashr icon

    Ashr

    LLM Evaluations

    Ashr is an AI agent evaluation platform that mimics production environments and user behavior to catch agent failures before they reach real users.

    Visit Website

    At a Glance

    Pricing
    Free

    Schedule a call to get started with Ashr evals for your AI agent.

    Engagement

    Available On

    Windows
    macOS
    Linux
    Web
    API

    Resources

    WebsiteDocsllms.txt

    Topics

    LLM EvaluationsAgent FrameworksAutomated Testing

    Alternatives

    MaximLangChainPatronus AI
    Developer
    AshrBerkeley, CAEst. 2026$500000 raised

    Listed Mar 2026

    About Ashr

    Ashr is an eval platform built specifically for AI agents, enabling teams to run automated tests, catch regressions, and fix failures before they impact production users. It simulates realistic multi-turn conversations — including audio, image, and file inputs — and compares expected vs. actual tool calls and agent responses with detailed scoring. Backed by Y Combinator, Ashr is used by teams at UC Berkeley, Stanford, and several AI startups to ship agents with confidence.

    • Scenario-based evals: Define multi-turn test scenarios with realistic user personas, audio profiles, and tool call sequences to stress-test your agent end-to-end.
    • Tool call matching: Ashr compares expected and actual tool calls side-by-side, flagging exact matches, partial matches, and mismatches with divergence notes.
    • Dataset browser: Browse every test dataset your agent has run, with full timelines, speaker turns, tool calls, and pass/fail scores in one place.
    • Prompt version control: Track every prompt version with inline diffs, per-version pass rates, and run history so you always know which edit caused a regression.
    • Validation scoring: Multiple scoring methods — embeddings, LLM-judge, and exact-match — give a comprehensive view of agent quality across runs.
    • Python SDK (ashr-labs): Install via pip install ashr-labs, initialize AshrLabsClient, and use RunBuilder to incrementally record test results as your agent executes, then deploy them to the dashboard.
    • CI/CD integration: Drop the SDK into GitHub Actions workflows to automatically submit eval results on every push or pull request.
    • Auto-eval generation: Wrap your agent with the SDK in production capture mode and Ashr auto-generates eval datasets from real usage patterns.
    • Request-based dataset generation: Submit a create_request payload describing your agent, domain, and test config to generate comprehensive, domain-specific test datasets.
    • API key management: Create, list, and revoke API keys directly from the web dashboard or programmatically via the SDK.
    Ashr - 1

    Community Discussions

    Be the first to start a conversation about Ashr

    Share your experience with Ashr, ask questions, or help others learn from your insights.

    Pricing

    FREE

    Get Started

    Schedule a call to get started with Ashr evals for your AI agent.

    • Scenario-based evals
    • Tool call matching
    • Dataset browser
    • Python SDK access
    • CI/CD integration

    Capabilities

    Key Features

    • Scenario-based multi-turn agent evals
    • Tool call expected vs. actual comparison
    • Embeddings, LLM-judge, and exact-match scoring
    • Dataset browser with full test timelines
    • Prompt version control with inline diffs
    • Auto-eval generation from production traffic
    • CI/CD integration via GitHub Actions
    • Python SDK (ashr-labs) with RunBuilder
    • Request-based test dataset generation
    • API key management
    • Audio, image, and file input support in scenarios
    • Run history and pass rate tracking per prompt version

    Integrations

    GitHub Actions
    Python
    PyPI
    API Available
    View Docs

    Reviews & Ratings

    No ratings yet

    Be the first to rate Ashr and help others make informed decisions.

    Developer

    Ashr Team

    Ashr builds evaluation infrastructure for AI agents, helping teams catch failures before they reach production users. The platform mimics real production environments and user behavior to generate realistic test scenarios. Backed by Y Combinator, Ashr serves AI teams at universities and startups with tools for automated evals, prompt version control, and CI/CD-integrated testing.

    Founded 2026
    Berkeley, CA
    $500000 raised
    2 employees

    Used by

    UC Berkeley
    Stanford University
    Human Behavior
    Pax Historia
    +1 more
    Read more about Ashr Team
    Website
    1 tool in directory

    Similar Tools

    Maxim icon

    Maxim

    Enterprise-grade AI evaluation and observability platform for testing, monitoring, and improving AI agents and LLM applications.

    LangChain icon

    LangChain

    LangChain provides LangSmith, an agent engineering platform, and open source frameworks (LangChain, LangGraph, deepagents) to help developers observe, evaluate, and deploy AI agents in production.

    Patronus AI icon

    Patronus AI

    Automated evaluation and monitoring platform that scores, detects failures, and optimizes LLMs and AI agents using evaluation models, experiments, traces, and an API/SDK ecosystem.

    Browse all tools

    Related Topics

    LLM Evaluations

    Platforms and frameworks for evaluating, testing, and benchmarking LLM systems and AI applications. These tools provide evaluators and evaluation models to score AI outputs, measure hallucinations, assess RAG quality, detect failures, and optimize model performance. Features include automated testing with LLM-as-a-judge metrics, component-level evaluation with tracing, regression testing in CI/CD pipelines, custom evaluator creation, dataset curation, and real-time monitoring of production systems. Teams use these solutions to validate prompt effectiveness, compare models side-by-side, ensure answer correctness and relevance, identify bias and toxicity, prevent PII leakage, and continuously improve AI product quality through experiments, benchmarks, and performance analytics.

    58 tools

    Agent Frameworks

    Tools and platforms for building and deploying custom AI agents.

    189 tools

    Automated Testing

    AI-powered platforms that automate end-to-end testing processes with intelligent test case generation, execution, and reporting for faster, more reliable software delivery.

    82 tools
    Browse all topics
    Back to all tools
    Explore AI Tools
    • AI Coding Assistants
    • Agent Frameworks
    • MCP Servers
    • AI Prompt Tools
    • Vibe Coding Tools
    • AI Design Tools
    • AI Database Tools
    • AI Website Builders
    • AI Testing Tools
    • LLM Evaluations
    Follow Us
    • X / Twitter
    • LinkedIn
    • Reddit
    • Discord
    • Threads
    • Bluesky
    • Mastodon
    • YouTube
    • GitHub
    • Instagram
    Get Started
    • About
    • Editorial Standards
    • Corrections & Disclosures
    • Community Guidelines
    • Advertise
    • Contact Us
    • Newsletter
    • Submit a Tool
    • Start a Discussion
    • Write A Blog
    • Share A Build
    • Terms of Service
    • Privacy Policy
    Explore with AI
    • ChatGPT
    • Gemini
    • Claude
    • Grok
    • Perplexity
    Agent Experience
    • llms.txt
    Theme
    With AI, Everyone is a Dev. EveryDev.ai © 2026
    4views
    Discussions