Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • News
  • Blogs
  • Builds
  • Contests
Create
Sign In
    EveryDev.ai
    Sign inSubscribe
    Home
    Topics

    214 topics

    • Trending
    AI Topics
    • Agents852
    • Coding826
    • Infrastructure375
    • Marketing347
    • Design291
    • Research273
    • Projects263
    • Analytics258
    • Integration156
    • Testing156
    • Data148
    • Security128
    • Learning124
    • MCP124
    • Extensions107
    • Communication102
    • Prompts90
    • Commerce86
    • Voice83
    • Web66
    • DevOps57
    • Finance17
    Sign In
    1. Home
    2. Topics
    3. Testing
    4. LLM Evaluations

    AI Tools & Discussions in LLM Evaluations

    Platforms and frameworks for evaluating, testing, and benchmarking LLM systems and AI applications. These tools provide evaluators and evaluation models to score AI outputs, measure hallucinations, assess RAG quality, detect failures, and optimize model performance. Features include automated testing with LLM-as-a-judge metrics, component-level evaluation with tracing, regression testing in CI/CD pipelines, custom evaluator creation, dataset curation, and real-time monitoring of production systems. Teams use these solutions to validate prompt effectiveness, compare models side-by-side, ensure answer correctness and relevance, identify bias and toxicity, prevent PII leakage, and continuously improve AI product quality through experiments, benchmarks, and performance analytics.

    LLM Evaluations Tools (47)

    View Kayba
    Kayba tool icon

    Kayba

    Agent Self Improvement Framework

    Agent FrameworksAgent MemoryLLM Evaluations
    View Gambit
    Gambit tool icon

    Gambit

    Open Source AI Dev Framework

    Agent HarnessAgent FrameworksLLM Evaluations
    View harness-kit
    harness-kit tool icon

    harness-kit

    AI Agent Benchmarking Library

    Agent HarnessLLM EvaluationsAgent Frameworks
    View Maxim
    Maxim tool icon

    Maxim

    FeaturedFeatured tool

    AI Evaluation and Observability Platform

    LLM EvaluationsObservabilityAgent Frameworks
    View Atla AI
    Atla AI tool icon

    Atla AI

    LLM Output Evaluation Platform

    LLM EvaluationsObservabilityAI Infrastructure
    View LOFT
    LOFT tool icon

    LOFT

    LLM Long Context Benchmark

    LLM EvaluationsRAGAcademic Research
    View Halluminate
    Halluminate tool icon

    Halluminate

    RL Environments for Finance AI

    AI InfrastructureAutonomous SystemsLLM Evaluations
    View AgentOps
    AgentOps tool icon

    AgentOps

    AI Agent Observability Platform

    ObservabilityLLM EvaluationsAgent Frameworks
    View promptfoo
    promptfoo tool icon

    promptfoo

    LLM Security Testing Platform

    App SecurityLLM EvaluationsSecurity Testing
    View Ragas
    Ragas tool icon

    Ragas

    LLM App Evaluation Framework

    LLM EvaluationsRAGObservability

    Top Tools in LLM Evaluations

    Highest trending score

    Artificial Analysis

    Independent AI model benchmarking platform providing comprehensive performance analysis across intelligence, speed, cost, and quality metrics

    llmfit

    LLMFit is an open-source CLI tool for benchmarking and evaluating the performance of large language models across various tasks.

    AgentOps

    AgentOps is a developer platform for tracing, debugging, and deploying reliable AI agents and LLM apps with observability across 400+ LLMs and frameworks.

    New in LLM Evaluations

    Kayba3d agoGambit4d agoharness-kit4d ago

    Featured Tool

    Maxim screenshot
    Maxim

    Enterprise-grade AI evaluation and observability platform for testing, monitoring, and improving AI agents and LLM applications.

    Last 7 Days

    8
    New Tools
    16
    Featured
    9
    Upvotes

    Related Topics

    Automated Testing74 tools
    Bug Detection25 tools
    Test Generation6 tools
    Performance Testing2 tools
    Visual Testing2 tools

    LLM Evaluations Discussions

    No discussions yet

    Be the first to start a discussion about LLM Evaluations

    Weekly Newsletter

    One weekly email. New AI dev tools, news, and trends.

    No spam — unsubscribe anytime

    Explore AI Tools
    • AI Coding Assistants
    • Agent Frameworks
    • MCP Servers
    • AI Prompt Tools
    • Vibe Coding Tools
    • AI Design Tools
    • AI Database Tools
    • AI Website Builders
    • AI Testing Tools
    • LLM Evaluations
    Follow Us
    • X / Twitter
    • LinkedIn
    • Reddit
    • Discord
    • Threads
    • Bluesky
    • Mastodon
    • YouTube
    • GitHub
    • Instagram
    Get Started
    • About
    • Editorial Standards
    • Corrections & Disclosures
    • Community Guidelines
    • Advertise
    • Contact Us
    • Newsletter
    • Submit a Tool
    • Start a Discussion
    • Write A Blog
    • Share A Build
    • Terms of Service
    • Privacy Policy
    Explore with AI
    • ChatGPT
    • Gemini
    • Claude
    • Grok
    • Perplexity
    Agent Experience
    • llms.txt
    Theme
    With AI, Everyone is a Dev. EveryDev.ai © 2026
    Sign in