EvalQA
EvalQA is an AI evaluation platform that combines certified human evaluators with automated metrics to measure the quality of AI agents, SaaS features, and knowledge work outputs.
At a Glance
Free access to the Eval Gym training curriculum and evaluator community for individuals seeking to become certified evaluators.
Engagement
Available On
Listed Jun 2026
About EvalQA
EvalQA is an evaluation platform built for teams shipping AI agents, AI-powered applications, and qualitative knowledge work. It combines trained human evaluators with automated metrics in a hybrid engine designed to catch what code testing misses — tone, accuracy, relevance, safety, and reasoning quality. The platform is currently accepting early access and is positioned as a self-serve alternative to enterprise-only evaluation services.
What It Is
EvalQA describes itself as "the evaluation layer for everything AI." Where traditional QA and automated testing produce binary pass/fail results at the code level, EvalQA applies rubric-based scoring across nuanced dimensions: tone, accuracy, relevance, helpfulness, and safety. The platform targets three primary use cases — AI agent evaluation (multi-step task flows, tool use, reasoning chains), SaaS and app feature evaluation (copilots, recommendations, chatbots), and knowledge work evaluation (marketing copy, analysis, deliverables). A self-serve API, SDK, and webhooks allow teams to integrate in under an hour according to the product page.
How the Hybrid Evaluation Engine Works
EvalQA's core differentiator is its hybrid approach: certified human evaluators and automated metrics run in parallel rather than as alternatives. The platform uses a three-tier evaluator certification program (Trainee, Expert, Specialist) to ensure domain-appropriate skill levels. Custom rubrics let teams define what matters for their specific domain, capturing nuance that automated tools miss. Real-time dashboards surface eval scores, trends, regression alerts, and evaluator agreement rates. The workflow follows a four-step pattern: define rubric → integrate pipeline → evaluate with precision → ship with confidence.
The Eval Army and Certification Curriculum
EvalQA operates a dual-sided platform. On the evaluator side, the "Eval Army" is a gamified training and certification program with a five-level mastery curriculum totaling 147 lessons across 30 chapters:
- L1 – Eval Foundations (EAF): 27 lessons, 30-question exam
- L2 – Eval Practitioner (EAP): 30 lessons, applied scenario exam
- L3 – Eval Specialist / CAEE: 30 lessons, 4-hour hands-on lab plus a Deployment Clearance Report
- L4 – Eval Architect (EAA): 30 lessons, portfolio plus case study plus peer defense
- L5 – Eval Commander (EAC): 30 lessons, portfolio plus oral defense plus industry contribution
Certifications are listed as coming soon at the time of the source pages. The curriculum covers evaluation method selection, inter-rater reliability (Cohen's Kappa, ICC), RAG evaluation metrics, safety and adversarial red teaming, psychometric rubric design, and regulatory alignment with NIST AI RMF and the EU AI Act.
Deployment Model and Integration
EvalQA is designed for self-serve access from day one, contrasting with competitors the site describes as enterprise-only. Integration options include a REST API, SDK, webhooks, and manual file upload for teams that want to start without engineering involvement. The business page states most teams are production-ready within days via white-glove onboarding. Safety and compliance features include red teaming, content safety evaluation, and SOC2-readiness, with on-premises deployment listed as available.
Current Status
EvalQA is in early access as of the source pages, operating under the TheWorkCompany initiative (© 2026 eval.qa). The certification exams across all five levels are listed as "coming soon." The platform is actively recruiting both business customers and evaluators, with a 24-hour response commitment for business inquiries. The knowledge base curriculum at eval.qa/learn is live with full lesson content, while the formal exam infrastructure is still being built out.
Community Discussions
Be the first to start a conversation about EvalQA
Share your experience with EvalQA, ask questions, or help others learn from your insights.
Pricing
Eval Army (Evaluator)
Free access to the Eval Gym training curriculum and evaluator community for individuals seeking to become certified evaluators.
- Access to 147-lesson curriculum
- Gamified skill trees
- Community and mentors
- Leaderboards
- Career path from Trainee to Specialist
Business Engagement
Custom-scoped evaluation engagements for AI teams, SaaS companies, and enterprises. Includes white-glove onboarding, dedicated evaluators, and custom rubrics.
- Custom rubric design
- Certified human evaluators
- Automated metrics
- Real-time dashboard
- White-glove onboarding
- SDK and API access
- Safety and compliance evaluation
- On-prem available
L1 Eval Foundations Certification (EAF)
Entry-level AI evaluation certification. 30 questions, 45-minute exam. No prerequisites.
- 30-question exam
- 45-minute timed assessment
- No prerequisites
- Covers eval fundamentals, metrics, and methods
L2 Eval Practitioner Certification (EAP)
Applied evaluation methods certification. 50 questions with applied scenarios. Requires L1 EAF.
- 50-question exam
- Applied scenario questions
- Requires L1 EAF prerequisite
- Covers metric design, pipeline design, and IRR
L3 Eval Specialist / CAEE Certification
Deep technical evaluation credential. 4-hour hands-on lab plus Deployment Clearance Report. Requires L2 EAP.
- 4-hour hands-on lab
- Deployment Clearance Report (DCR)
- Requires L2 EAP prerequisite
- Covers advanced RAG, safety, and psychometric rigor
L4 Eval Architect Certification (EAA)
Org-wide evaluation strategy credential. Portfolio plus case study plus peer defense. Requires L3 EAS.
- Portfolio submission
- Case study
- Peer defense
- Requires L3 EAS prerequisite
- Covers eval governance and multi-system architecture
L5 Eval Commander Certification (EAC)
Strategic evaluation leadership credential. Portfolio plus oral defense plus industry contribution. Requires L4 EAA.
- Portfolio submission
- Oral defense panel
- Industry contribution project
- Requires L4 EAA prerequisite
- Covers regulatory leadership and frontier evaluation challenges
Capabilities
Key Features
- Hybrid human + automated evaluation engine
- AI agent evaluation (multi-step tasks, tool use, reasoning)
- SaaS and app feature evaluation (copilots, recommendations, chatbots)
- Knowledge work and content evaluation
- Custom rubric builder
- Real-time evaluation dashboard with regression alerts
- Self-serve REST API, SDK, and webhooks
- Three-tier certified evaluator program (Trainee, Expert, Specialist)
- Gamified Eval Gym with skill trees and leaderboards
- Five-level AI evaluation certification curriculum (147 lessons)
- Safety and adversarial red teaming evaluation
- NIST AI RMF and EU AI Act compliance alignment
- White-glove onboarding
- On-premises deployment option
- SOC2-ready infrastructure
