EvalQA

Name: EvalQA
Availability: OnlineOnly
Author: EvalQA

EvalQA is an AI evaluation platform that combines certified human evaluators with automated metrics to measure the quality of AI agents, SaaS features, and knowledge work outputs.

Visit Website

At a Glance

Pricing

Free tier available

Free access to the Eval Gym training curriculum and evaluator community for individuals seeking to become certified evaluators.

Business Engagement: Custom/contact

L1 Eval Foundations Certification (EAF): $10 one-time

L2 Eval Practitioner Certification (EAP): $50 one-time

+3 more plans

Engagement

Available On

Web

API

EvalQAEvalQA is the team behind eval.qa, an AI evaluation platform…

Listed Jun 2026

About EvalQA

EvalQA is an evaluation platform built for teams shipping AI agents, AI-powered applications, and qualitative knowledge work. It combines trained human evaluators with automated metrics in a hybrid engine designed to catch what code testing misses — tone, accuracy, relevance, safety, and reasoning quality. The platform is currently accepting early access and is positioned as a self-serve alternative to enterprise-only evaluation services.

What It Is

EvalQA describes itself as "the evaluation layer for everything AI." Where traditional QA and automated testing produce binary pass/fail results at the code level, EvalQA applies rubric-based scoring across nuanced dimensions: tone, accuracy, relevance, helpfulness, and safety. The platform targets three primary use cases — AI agent evaluation (multi-step task flows, tool use, reasoning chains), SaaS and app feature evaluation (copilots, recommendations, chatbots), and knowledge work evaluation (marketing copy, analysis, deliverables). A self-serve API, SDK, and webhooks allow teams to integrate in under an hour according to the product page.

How the Hybrid Evaluation Engine Works

EvalQA's core differentiator is its hybrid approach: certified human evaluators and automated metrics run in parallel rather than as alternatives. The platform uses a three-tier evaluator certification program (Trainee, Expert, Specialist) to ensure domain-appropriate skill levels. Custom rubrics let teams define what matters for their specific domain, capturing nuance that automated tools miss. Real-time dashboards surface eval scores, trends, regression alerts, and evaluator agreement rates. The workflow follows a four-step pattern: define rubric → integrate pipeline → evaluate with precision → ship with confidence.

The Eval Army and Certification Curriculum

EvalQA operates a dual-sided platform. On the evaluator side, the "Eval Army" is a gamified training and certification program with a five-level mastery curriculum totaling 147 lessons across 30 chapters:

L1 – Eval Foundations (EAF): 27 lessons, 30-question exam
L2 – Eval Practitioner (EAP): 30 lessons, applied scenario exam
L3 – Eval Specialist / CAEE: 30 lessons, 4-hour hands-on lab plus a Deployment Clearance Report
L4 – Eval Architect (EAA): 30 lessons, portfolio plus case study plus peer defense
L5 – Eval Commander (EAC): 30 lessons, portfolio plus oral defense plus industry contribution

Certifications are listed as coming soon at the time of the source pages. The curriculum covers evaluation method selection, inter-rater reliability (Cohen's Kappa, ICC), RAG evaluation metrics, safety and adversarial red teaming, psychometric rubric design, and regulatory alignment with NIST AI RMF and the EU AI Act.

Deployment Model and Integration

EvalQA is designed for self-serve access from day one, contrasting with competitors the site describes as enterprise-only. Integration options include a REST API, SDK, webhooks, and manual file upload for teams that want to start without engineering involvement. The business page states most teams are production-ready within days via white-glove onboarding. Safety and compliance features include red teaming, content safety evaluation, and SOC2-readiness, with on-premises deployment listed as available.

Current Status

EvalQA is in early access as of the source pages, operating under the TheWorkCompany initiative (© 2026 eval.qa). The certification exams across all five levels are listed as "coming soon." The platform is actively recruiting both business customers and evaluators, with a 24-hour response commitment for business inquiries. The knowledge base curriculum at eval.qa/learn is live with full lesson content, while the formal exam infrastructure is still being built out.

Community Discussions

Be the first to start a conversation about EvalQA

Share your experience with EvalQA, ask questions, or help others learn from your insights.

Pricing

FREE

Eval Army (Evaluator)

Free access to the Eval Gym training curriculum and evaluator community for individuals seeking to become certified evaluators.

Access to 147-lesson curriculum
Gamified skill trees
Community and mentors
Leaderboards
Career path from Trainee to Specialist

Business Engagement

Custom-scoped evaluation engagements for AI teams, SaaS companies, and enterprises. Includes white-glove onboarding, dedicated evaluators, and custom rubrics.

Custom

contact sales

Custom rubric design
Certified human evaluators
Automated metrics
Real-time dashboard
White-glove onboarding
SDK and API access
Safety and compliance evaluation
On-prem available

L1 Eval Foundations Certification (EAF)

Entry-level AI evaluation certification. 30 questions, 45-minute exam. No prerequisites.

$10

one time

30-question exam
45-minute timed assessment
No prerequisites
Covers eval fundamentals, metrics, and methods

L2 Eval Practitioner Certification (EAP)

Applied evaluation methods certification. 50 questions with applied scenarios. Requires L1 EAF.

$50

one time

50-question exam
Applied scenario questions
Requires L1 EAF prerequisite
Covers metric design, pipeline design, and IRR

L3 Eval Specialist / CAEE Certification

Deep technical evaluation credential. 4-hour hands-on lab plus Deployment Clearance Report. Requires L2 EAP.

$100

one time

4-hour hands-on lab
Deployment Clearance Report (DCR)
Requires L2 EAP prerequisite
Covers advanced RAG, safety, and psychometric rigor

L4 Eval Architect Certification (EAA)

Org-wide evaluation strategy credential. Portfolio plus case study plus peer defense. Requires L3 EAS.

$150

one time

Portfolio submission
Case study
Peer defense
Requires L3 EAS prerequisite
Covers eval governance and multi-system architecture

L5 Eval Commander Certification (EAC)

Strategic evaluation leadership credential. Portfolio plus oral defense plus industry contribution. Requires L4 EAA.

$200

one time

Portfolio submission
Oral defense panel
Industry contribution project
Requires L4 EAA prerequisite
Covers regulatory leadership and frontier evaluation challenges

View official pricing

Capabilities

Key Features

Hybrid human + automated evaluation engine
AI agent evaluation (multi-step tasks, tool use, reasoning)
SaaS and app feature evaluation (copilots, recommendations, chatbots)
Knowledge work and content evaluation
Custom rubric builder
Real-time evaluation dashboard with regression alerts
Self-serve REST API, SDK, and webhooks
Three-tier certified evaluator program (Trainee, Expert, Specialist)
Gamified Eval Gym with skill trees and leaderboards
Five-level AI evaluation certification curriculum (147 lessons)
Safety and adversarial red teaming evaluation
NIST AI RMF and EU AI Act compliance alignment
White-glove onboarding
On-premises deployment option
SOC2-ready infrastructure

Integrations

REST API

SDK

Webhooks

CI/CD pipelines

DeepEval

RAGAS

LangSmith

Arize

TruLens

MLflow

API Available

View Docs

Back to all tools Suggest an edit

About EvalQA

What It Is

How the Hybrid Evaluation Engine Works

The Eval Army and Certification Curriculum

L1 – Eval Foundations (EAF): 27 lessons, 30-question exam
L2 – Eval Practitioner (EAP): 30 lessons, applied scenario exam
L3 – Eval Specialist / CAEE: 30 lessons, 4-hour hands-on lab plus a Deployment Clearance Report
L4 – Eval Architect (EAA): 30 lessons, portfolio plus case study plus peer defense
L5 – Eval Commander (EAC): 30 lessons, portfolio plus oral defense plus industry contribution

EvalQA