# EvalQA

> EvalQA is an AI evaluation platform that combines certified human evaluators with automated metrics to measure the quality of AI agents, SaaS features, and knowledge work outputs.

EvalQA is an evaluation platform built for teams shipping AI agents, AI-powered applications, and qualitative knowledge work. It combines trained human evaluators with automated metrics in a hybrid engine designed to catch what code testing misses — tone, accuracy, relevance, safety, and reasoning quality. The platform is currently accepting early access and is positioned as a self-serve alternative to enterprise-only evaluation services.

## What It Is

EvalQA describes itself as "the evaluation layer for everything AI." Where traditional QA and automated testing produce binary pass/fail results at the code level, EvalQA applies rubric-based scoring across nuanced dimensions: tone, accuracy, relevance, helpfulness, and safety. The platform targets three primary use cases — AI agent evaluation (multi-step task flows, tool use, reasoning chains), SaaS and app feature evaluation (copilots, recommendations, chatbots), and knowledge work evaluation (marketing copy, analysis, deliverables). A self-serve API, SDK, and webhooks allow teams to integrate in under an hour according to the product page.

## How the Hybrid Evaluation Engine Works

EvalQA's core differentiator is its hybrid approach: certified human evaluators and automated metrics run in parallel rather than as alternatives. The platform uses a three-tier evaluator certification program (Trainee, Expert, Specialist) to ensure domain-appropriate skill levels. Custom rubrics let teams define what matters for their specific domain, capturing nuance that automated tools miss. Real-time dashboards surface eval scores, trends, regression alerts, and evaluator agreement rates. The workflow follows a four-step pattern: define rubric → integrate pipeline → evaluate with precision → ship with confidence.

## The Eval Army and Certification Curriculum

EvalQA operates a dual-sided platform. On the evaluator side, the "Eval Army" is a gamified training and certification program with a five-level mastery curriculum totaling 147 lessons across 30 chapters:

- **L1 – Eval Foundations (EAF):** 27 lessons, 30-question exam
- **L2 – Eval Practitioner (EAP):** 30 lessons, applied scenario exam
- **L3 – Eval Specialist / CAEE:** 30 lessons, 4-hour hands-on lab plus a Deployment Clearance Report
- **L4 – Eval Architect (EAA):** 30 lessons, portfolio plus case study plus peer defense
- **L5 – Eval Commander (EAC):** 30 lessons, portfolio plus oral defense plus industry contribution

Certifications are listed as coming soon at the time of the source pages. The curriculum covers evaluation method selection, inter-rater reliability (Cohen's Kappa, ICC), RAG evaluation metrics, safety and adversarial red teaming, psychometric rubric design, and regulatory alignment with NIST AI RMF and the EU AI Act.

## Deployment Model and Integration

EvalQA is designed for self-serve access from day one, contrasting with competitors the site describes as enterprise-only. Integration options include a REST API, SDK, webhooks, and manual file upload for teams that want to start without engineering involvement. The business page states most teams are production-ready within days via white-glove onboarding. Safety and compliance features include red teaming, content safety evaluation, and SOC2-readiness, with on-premises deployment listed as available.

## Current Status

EvalQA is in early access as of the source pages, operating under the TheWorkCompany initiative (© 2026 eval.qa). The certification exams across all five levels are listed as "coming soon." The platform is actively recruiting both business customers and evaluators, with a 24-hour response commitment for business inquiries. The knowledge base curriculum at eval.qa/learn is live with full lesson content, while the formal exam infrastructure is still being built out.

## Features
- Hybrid human + automated evaluation engine
- AI agent evaluation (multi-step tasks, tool use, reasoning)
- SaaS and app feature evaluation (copilots, recommendations, chatbots)
- Knowledge work and content evaluation
- Custom rubric builder
- Real-time evaluation dashboard with regression alerts
- Self-serve REST API, SDK, and webhooks
- Three-tier certified evaluator program (Trainee, Expert, Specialist)
- Gamified Eval Gym with skill trees and leaderboards
- Five-level AI evaluation certification curriculum (147 lessons)
- Safety and adversarial red teaming evaluation
- NIST AI RMF and EU AI Act compliance alignment
- White-glove onboarding
- On-premises deployment option
- SOC2-ready infrastructure

## Integrations
REST API, SDK, Webhooks, CI/CD pipelines, DeepEval, RAGAS, LangSmith, Arize, TruLens, MLflow

## Platforms
WEB, API

## Pricing
Freemium — Free tier available with paid upgrades

## Links
- Website: https://eval.qa
- Documentation: https://eval.qa/learn
- EveryDev.ai: https://www.everydev.ai/tools/evalqa