EveryDev.ai
Sign inSubscribe
Explore AI Tools
  • AI Coding Assistants
  • Agent Frameworks
  • MCP Servers
  • AI Prompt Tools
  • Vibe Coding Tools
  • AI Design Tools
  • AI Database Tools
  • AI Website Builders
  • AI Testing Tools
  • LLM Evaluations
Follow Us
  • X / Twitter
  • LinkedIn
  • Reddit
  • Discord
  • Threads
  • Bluesky
  • Mastodon
  • YouTube
  • GitHub
  • Instagram
Get Started
  • About
  • Editorial Standards
  • Corrections & Disclosures
  • Community Guidelines
  • Advertise
  • Contact Us
  • Newsletter
  • Submit a Tool
  • Start a Discussion
  • Write A Blog
  • Share A Build
  • Terms of Service
  • Privacy Policy
Explore with AI
  • ChatGPT
  • Gemini
  • Claude
  • Grok
  • Perplexity
Agent Experience
  • llms.txt
Theme
With AI, Everyone is a Dev. EveryDev.ai © 2026
Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • Communities
  • News
  • Podcasts
  • Blogs
  • Builds
  • Contests
  • Compare
  • Arena
Create
    Home
    Tools

    2,480+ AI tools

    • New
    • Trending
    • Featured
    • Compare
    • Arena
    Categories
    • Agents1596
    • Coding1181
    • Infrastructure526
    • Marketing447
    • Design427
    • Projects384
    • Research357
    • Analytics331
    • Testing221
    • MCP216
    • Data205
    • Security196
    • Integration169
    • Learning154
    • Communication146
    • Prompts140
    • Extensions137
    • Commerce123
    • Voice122
    • DevOps99
    • Web77
    • Finance21
    1. Home
    2. Tools
    3. Verifiers
    Verifiers icon

    Verifiers

    Agent Harness
    Featured

    An open-source Python library by Prime Intellect for creating environments to train and evaluate LLMs using reinforcement learning.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Free and open-source under the MIT License.

    Engagement

    Available On

    Web
    API
    SDK
    CLI

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    Agent HarnessLLM EvaluationsHuman-in-the-Loop Training

    Alternatives

    OpenTracesharness-kitSWE-smith
    Developer
    Prime IntellectSan Francisco, CAEst. 2023$20.5M+ raised

    Listed May 2026

    About Verifiers

    Verifiers is an open-source Python library developed by Prime Intellect AI for building environments that train and evaluate large language models (LLMs) via reinforcement learning (RL). Originally created by Will Brown, the project is hosted on GitHub under the MIT License and integrates tightly with Prime Intellect's broader platform, including the Environments Hub, the prime-rl training framework, and Hosted Training infrastructure.

    What It Is

    Verifiers provides a structured way to define self-contained RL environments for LLMs. Each environment bundles three core components: a dataset of task inputs, a harness for the model (covering tools, sandboxes, and context management), and a rubric (reward function) to score model performance. These environments can be used for RL training, capability evaluation, synthetic data generation, and agent harness experimentation.

    Architecture and Core Concepts

    The library supports both a legacy single-turn API and a newer v1 Taskset/Harness API introduced in v0.1.14. Key abstractions include:

    • Taskset: Defines the dataset rows, reward functions, and split configuration for a task.
    • Harness: Manages how the model interacts with tools, sandboxes, and multi-turn context.
    • Rubric: Scores model completions using async reward functions with configurable weights.
    • Environment types: SingleTurnEnv, MultiTurnEnv, RLMEnv, CliAgentEnv, BrowserEnv, and OpenEnv integrations cover a wide range of interaction protocols.

    Environments are self-contained Python modules installable via uv, and the same package is used for both evals and RL training runs.

    Workflow and Tooling

    The prime CLI drives the development workflow end-to-end:

    • prime lab setup scaffolds a workspace with recommended configs for training, eval, and prompt optimization (GEPA).
    • prime env init generates a new environment template.
    • prime eval run executes local evaluations against any OpenAI-compatible endpoint.
    • prime env push publishes environments to the public Environments Hub.
    • vf-tui provides a terminal UI for reviewing eval results.

    Configuration is TOML-based, with separate config shapes for RL training (model, batch size, rollouts) and environment-specific options (taskset split, harness max turns, reward weights).

    Environments Hub Integration

    Verifiers connects to the Prime Intellect Environments Hub, a shared registry where community-built environments can be published, discovered, and installed. Environments from the hub (e.g., primeintellect/math-python) can be installed directly into a local project and used for both evaluation and training without modification.

    Update: v0.1.14 and Active Development

    The project has seen rapid release cadence since its creation in January 2025. As of the latest available data:

    • v0.1.15.dev7 is the most recent pre-release (published May 15, 2026).
    • v0.1.14 (released May 7, 2026) introduced the v1 Taskset/Harness API, shared eval/training config shapes, model-family starter configs, OpenAI Responses and renderer-backed clients, per-turn timing, GEPA prompt artifacts, Lean guard markers, and infrastructure hardening.
    • Earlier notable releases include v0.1.12 (RLMEnv improvements, multi-worker env server), v0.1.11 (unified client stack, eval TUI), v0.1.10 (OpenEnv and BrowserEnv integrations), and v0.1.8 (trajectory-based rollout tracking for token-in/token-out training).

    The repository reports 4,123 stars and 551 forks as of the last recorded update, according to GitHub metadata.

    Verifiers - 1

    Community Discussions

    Be the first to start a conversation about Verifiers

    Share your experience with Verifiers, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source

    Free and open-source under the MIT License.

    • Full source code access
    • MIT License
    • All environment types and APIs
    • CLI tooling
    • Environments Hub integration

    Capabilities

    Key Features

    • RL environment creation for LLM training and evaluation
    • Taskset/Harness v1 API for reusable environment components
    • SingleTurnEnv, MultiTurnEnv, RLMEnv, CliAgentEnv, BrowserEnv, OpenEnv support
    • Rubric-based reward functions with async scoring
    • prime CLI for workspace setup, env init, eval run, and env push
    • Environments Hub integration for publishing and installing environments
    • TOML-based training and eval configuration
    • OpenAI-compatible API endpoint support
    • Per-turn timing and token tracking
    • GEPA prompt optimization support
    • vf-tui terminal UI for eval result review
    • Sandbox lifecycle management
    • Trajectory-based rollout tracking
    • Pass@k and ablation sweep support
    • Integration with prime-rl training framework

    Integrations

    prime-rl
    Prime Intellect Hosted Training
    Prime Intellect Environments Hub
    Prime Inference
    OpenAI API (compatible endpoints)
    OpenCode
    OpenEnv
    BrowserEnv
    Harbor task directories
    uv (Python package manager)
    API Available
    View Docs

    Reviews & Ratings

    No ratings yet

    Be the first to rate Verifiers and help others make informed decisions.

    Developer

    Prime Intellect

    Prime Intellect builds open infrastructure for distributed AI training and evaluation. The team develops the `verifiers` library for LLM RL environments, the `prime-rl` training framework, and a Hosted Training platform with an Environments Hub for sharing and running community-built evaluation environments. Prime Intellect focuses on making large-scale model training accessible through open-source tooling and cloud compute infrastructure.

    Founded 2023
    San Francisco, CA
    $20.5M+ raised
    43 employees

    Used by

    Zapier
    Ramp
    Read more about Prime Intellect
    WebsiteGitHub
    1 tool in directory

    Similar Tools

    OpenTraces icon

    OpenTraces

    A CLI tool to parse, sanitize, and commit AI agent session traces to HuggingFace Hub for training, evaluation, and open data sharing.

    harness-kit icon

    harness-kit

    A Python toolkit for building and evaluating AI agent harnesses, enabling structured testing and benchmarking of LLM-based agents.

    SWE-smith icon

    SWE-smith

    An open-source toolkit for generating training data and task instances for software engineering agents, enabling fine-tuning of LMs on real GitHub repositories.

    Browse all tools

    Related Topics

    Agent Harness

    Infrastructure, orchestrators, and task runners that wrap around LLM coding agents — covering session management, context delivery, worktree isolation, architecture enforcement, and issue-to-PR pipelines.

    83 tools

    LLM Evaluations

    Platforms and frameworks for evaluating, testing, and benchmarking LLM systems and AI applications. These tools provide evaluators and evaluation models to score AI outputs, measure hallucinations, assess RAG quality, detect failures, and optimize model performance. Features include automated testing with LLM-as-a-judge metrics, component-level evaluation with tracing, regression testing in CI/CD pipelines, custom evaluator creation, dataset curation, and real-time monitoring of production systems. Teams use these solutions to validate prompt effectiveness, compare models side-by-side, ensure answer correctness and relevance, identify bias and toxicity, prevent PII leakage, and continuously improve AI product quality through experiments, benchmarks, and performance analytics.

    82 tools

    Human-in-the-Loop Training

    Platforms that connect organizations with vetted human experts to annotate, label, evaluate, and align AI models, ensuring high-quality training datasets and accurate model evaluation through human judgment.

    27 tools
    Browse all topics
    Back to all tools
    Discussions