# evo

> An open-source autoresearch orchestrator that runs parallel coding agent experiments on your repo, scores every patch, and keeps only changes that improve the target metric.

Evo is an open-source autoresearch orchestrator for codebases, built by evo-hq and released under the Apache 2.0 license. It plugs into popular agentic coding frameworks and runs a structured tree search over your repository — discovering what to measure, instrumenting the benchmark, and iterating with parallel subagents until the score stops improving. The hosted platform is currently in waitlisted beta, while the CLI and plugin are freely installable from PyPI and GitHub.

## What It Is

Evo sits in the category of autonomous code-optimization systems. Rather than asking a developer to manually direct an AI agent, evo sets up an automated loop: it discovers metrics, runs experiments in isolated git worktrees, scores each patch, and commits only the changes that pass both the metric threshold and any registered gates. The project describes itself as inspired by Karpathy's autoresearch concept — a pure hill-climb where an LLM runs experiments autonomously to beat its own best score — but adds tree search, parallelism, shared state, and gating on top of that baseline idea.

## How the Optimization Loop Works

The core workflow involves two commands:

- `/evo:discover` — explores the repo, identifies what to measure, instruments the evaluation, and attaches a held-out-slice score-floor gate automatically when building a benchmark from scratch.
- `/evo:optimize` — runs the experiment loop, dispatching parallel subagents each in its own isolated workspace.

Each subagent reads shared state (failure traces, annotations, discarded hypotheses), forms a hypothesis, edits code, and runs the benchmark. The orchestrator then selects which committed branch to extend next using a configurable frontier strategy: argmax, top-k, epsilon-greedy, softmax, or pareto-per-task. Between rounds, scan subagents read trace batches in parallel and surface compound failure patterns, feeding findings back into shared state for the next round.

## Agent and Infrastructure Compatibility

Evo is designed as a plugin for existing agentic frameworks rather than a standalone agent. It currently supports:

- **Agents:** Claude Code, Codex, Cursor, OpenClaw, Hermes, Opencode, Pi
- **Sandboxes & infra:** Local git worktrees (default), SSH, Modal, E2B, Daytona, AWS EC2, Azure VMs

Installation is handled via `evo install <host>`, which places the plugin into the host's marketplace and stages the hooks evo needs to communicate with in-flight subagents.

## Gates and Safety Checks

A key design element is the **gates** system: pass/fail checks that run on every experiment. Any command that exits zero on pass and non-zero on fail qualifies — a test suite, an invariant script, or a score floor on a held-out benchmark slice. Gates inherit down the experiment tree, so a gate registered at the root runs on every descendant. The README notes that without gates, search will find ways to return a constant, skip work, or trade correctness for speed.

## Update: evo 0.5.0

The latest stable release is v0.5.0, published on June 6, 2026. The project has been active since April 2026 and shows regular release cadence, with recent patch releases (0.4.4, 0.4.5) addressing issues like Codex hook trust and plugin cache bugs. The changelog documents migration paths for pre-0.4.4 installs and alpha testing procedures. The hosted platform remains in waitlisted beta as of the latest available information, while the open-source CLI is available on PyPI as `evo-hq-cli`.

## Features
- Autoresearch orchestration loop for codebases
- Benchmark discovery via /evo:discover command
- Parallel subagents running in isolated git worktrees
- Tree search over greedy hill climb
- Shared state across agents (failure traces, annotations, discarded hypotheses)
- Configurable frontier strategies: argmax, top-k, epsilon-greedy, softmax, pareto-per-task
- Gates system for pass/fail regression and safety checks
- Cross-cutting scan subagents for compound failure pattern detection
- Local and remote sandbox backends (Modal, E2B, Daytona, AWS, Azure, SSH)
- Web dashboard for monitoring experiments
- Plugin compatibility with Claude Code, Codex, Cursor, OpenClaw, Hermes, Opencode, Pi
- evo update command for CLI and host plugin version management

## Integrations
Claude Code, Codex, Cursor, OpenClaw, Hermes, Opencode, Pi, Modal, E2B, Daytona, AWS EC2, Azure VMs, PyPI

## Platforms
CLI, WEB, API

## Pricing
Open Source

## Version
v0.5.0

## Links
- Website: https://evo-hq.com
- Documentation: https://evo-hq.com/docs/
- Repository: https://github.com/evo-hq/evo
- EveryDev.ai: https://www.everydev.ai/tools/evo-autoresearch-orchestrator