Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • Communities
  • News
  • Blogs
  • Builds
  • Contests
  • Compare
  • Arena
Create
    EveryDev.ai
    Sign inSubscribe
    Home
    Topics

    215 topics

    • Trending
    AI Topics
    • Agents1228
    • Coding1045
    • Infrastructure455
    • Marketing414
    • Design374
    • Projects340
    • Analytics319
    • Research306
    • Testing200
    • Data171
    • Integration169
    • Security169
    • MCP164
    • Learning146
    • Communication131
    • Prompts122
    • Extensions120
    • Commerce116
    • Voice107
    • DevOps92
    • Web73
    • Finance19
    1. Home
    2. Topics
    3. Testing
    4. LLM Evaluations

    AI Tools & Discussions in LLM Evaluations

    Platforms and frameworks for evaluating, testing, and benchmarking LLM systems and AI applications. These tools provide evaluators and evaluation models to score AI outputs, measure hallucinations, assess RAG quality, detect failures, and optimize model performance. Features include automated testing with LLM-as-a-judge metrics, component-level evaluation with tracing, regression testing in CI/CD pipelines, custom evaluator creation, dataset curation, and real-time monitoring of production systems. Teams use these solutions to validate prompt effectiveness, compare models side-by-side, ensure answer correctness and relevance, identify bias and toxicity, prevent PII leakage, and continuously improve AI product quality through experiments, benchmarks, and performance analytics.

    LLM Evaluations Tools (63)

    View AgentDoG
    AgentDoG tool icon

    AgentDoG

    AI Agent Safety Guardrail Framework

    App SecurityAutonomous SystemsLLM Evaluations
    View Plurai
    Plurai tool icon

    Plurai

    AI Agent Evaluation Platform

    LLM EvaluationsAutonomous SystemsObservability
    View LamBench
    LamBench tool icon

    LamBench

    Lambda Calculus AI Benchmark

    LLM EvaluationsAI Dev LibrariesLocal Inference
    View Regent
    Regent tool icon

    Regent

    LLM Regression Testing for PRs

    LLM EvaluationsAutomated TestingAI Infrastructure
    View autocontext
    autocontext tool icon

    autocontext

    Self Improving LLM Agent Harness

    Agent HarnessMulti-agent SystemsLLM Evaluations
    View Kelet
    Kelet tool icon

    Kelet

    Featured

    AI Agent Reliability Platform

    ObservabilityAgent FrameworksLLM Evaluations
    View BridgeBench
    BridgeBench tool icon

    BridgeBench

    AI Coding Model Benchmark Platform

    LLM EvaluationsUser ResearchPerformance Metrics
    View MLflow
    MLflow tool icon

    MLflow

    Featured

    Open Source AI Lifecycle Platform

    LLM EvaluationsObservabilityModel Management
    View Agent Reading Test
    Agent Reading Test tool icon

    Agent Reading Test

    AI Agent Doc Reading Benchmark

    LLM EvaluationsAgent FrameworksDocumentation
    View mdarena
    mdarena tool icon

    mdarena

    Benchmark CLAUDE md Files CLI

    LLM EvaluationsAI Coding Asst.Automated Testing

    Top Tools in LLM Evaluations

    Highest trending score

    Traceloop

    LLM reliability platform that turns evals and monitors into a continuous feedback loop for faster, more reliable AI app releases.

    Artificial Analysis

    Independent AI model benchmarking platform providing comprehensive performance analysis across intelligence, speed, cost, and quality metrics

    Scale AI

    Scale AI provides enterprise-grade data labeling, model evaluation, RLHF, and a GenAI Data Engine with API and SDKs to build, fine-tune, and deploy production AI systems.

    New in LLM Evaluations

    AgentDoG22h agoPlurai22h agoLamBench4d ago

    Featured Tool

    Scale AI screenshot
    Scale AI

    Scale AI provides enterprise-grade data labeling, model evaluation, RLHF, and a GenAI Data Engine with API and SDKs to build, fine-tune, and deploy production AI systems.

    Last 7 Days

    4
    New Tools
    20
    Featured
    10
    Upvotes

    Related Topics

    Automated Testing85 tools
    Bug Detection33 tools
    Test Generation14 tools
    Visual Testing5 tools
    Performance Testing1 tools

    LLM Evaluations Discussions

    No discussions yet

    Be the first to start a discussion about LLM Evaluations

    Weekly Newsletter

    One weekly email. New AI dev tools, news, and trends.

    No spam — unsubscribe anytime

    Explore AI Tools
    • AI Coding Assistants
    • Agent Frameworks
    • MCP Servers
    • AI Prompt Tools
    • Vibe Coding Tools
    • AI Design Tools
    • AI Database Tools
    • AI Website Builders
    • AI Testing Tools
    • LLM Evaluations
    Follow Us
    • X / Twitter
    • LinkedIn
    • Reddit
    • Discord
    • Threads
    • Bluesky
    • Mastodon
    • YouTube
    • GitHub
    • Instagram
    Get Started
    • About
    • Editorial Standards
    • Corrections & Disclosures
    • Community Guidelines
    • Advertise
    • Contact Us
    • Newsletter
    • Submit a Tool
    • Start a Discussion
    • Write A Blog
    • Share A Build
    • Terms of Service
    • Privacy Policy
    Explore with AI
    • ChatGPT
    • Gemini
    • Claude
    • Grok
    • Perplexity
    Agent Experience
    • llms.txt
    Theme
    With AI, Everyone is a Dev. EveryDev.ai © 2026