evo
An open-source autoresearch orchestrator that runs parallel coding agent experiments on your repo, scores every patch, and keeps only changes that improve the target metric.
At a Glance
Fully open-source CLI and plugin available on PyPI and GitHub under Apache 2.0.
Engagement
Available On
Alternatives
Listed Jun 2026
About evo
Evo is an open-source autoresearch orchestrator for codebases, built by evo-hq and released under the Apache 2.0 license. It plugs into popular agentic coding frameworks and runs a structured tree search over your repository — discovering what to measure, instrumenting the benchmark, and iterating with parallel subagents until the score stops improving. The hosted platform is currently in waitlisted beta, while the CLI and plugin are freely installable from PyPI and GitHub.
What It Is
Evo sits in the category of autonomous code-optimization systems. Rather than asking a developer to manually direct an AI agent, evo sets up an automated loop: it discovers metrics, runs experiments in isolated git worktrees, scores each patch, and commits only the changes that pass both the metric threshold and any registered gates. The project describes itself as inspired by Karpathy's autoresearch concept — a pure hill-climb where an LLM runs experiments autonomously to beat its own best score — but adds tree search, parallelism, shared state, and gating on top of that baseline idea.
How the Optimization Loop Works
The core workflow involves two commands:
/evo:discover— explores the repo, identifies what to measure, instruments the evaluation, and attaches a held-out-slice score-floor gate automatically when building a benchmark from scratch./evo:optimize— runs the experiment loop, dispatching parallel subagents each in its own isolated workspace.
Each subagent reads shared state (failure traces, annotations, discarded hypotheses), forms a hypothesis, edits code, and runs the benchmark. The orchestrator then selects which committed branch to extend next using a configurable frontier strategy: argmax, top-k, epsilon-greedy, softmax, or pareto-per-task. Between rounds, scan subagents read trace batches in parallel and surface compound failure patterns, feeding findings back into shared state for the next round.
Agent and Infrastructure Compatibility
Evo is designed as a plugin for existing agentic frameworks rather than a standalone agent. It currently supports:
- Agents: Claude Code, Codex, Cursor, OpenClaw, Hermes, Opencode, Pi
- Sandboxes & infra: Local git worktrees (default), SSH, Modal, E2B, Daytona, AWS EC2, Azure VMs
Installation is handled via evo install <host>, which places the plugin into the host's marketplace and stages the hooks evo needs to communicate with in-flight subagents.
Gates and Safety Checks
A key design element is the gates system: pass/fail checks that run on every experiment. Any command that exits zero on pass and non-zero on fail qualifies — a test suite, an invariant script, or a score floor on a held-out benchmark slice. Gates inherit down the experiment tree, so a gate registered at the root runs on every descendant. The README notes that without gates, search will find ways to return a constant, skip work, or trade correctness for speed.
Update: evo 0.5.0
The latest stable release is v0.5.0, published on June 6, 2026. The project has been active since April 2026 and shows regular release cadence, with recent patch releases (0.4.4, 0.4.5) addressing issues like Codex hook trust and plugin cache bugs. The changelog documents migration paths for pre-0.4.4 installs and alpha testing procedures. The hosted platform remains in waitlisted beta as of the latest available information, while the open-source CLI is available on PyPI as evo-hq-cli.
Community Discussions
Be the first to start a conversation about evo
Share your experience with evo, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully open-source CLI and plugin available on PyPI and GitHub under Apache 2.0.
- evo CLI (evo-hq-cli on PyPI)
- Plugin for Claude Code, Codex, Cursor, OpenClaw, Hermes, Opencode, Pi
- Local git worktree backend
- SSH backend
- Remote backends via provider extras (Modal, E2B, Daytona, AWS, Azure)
Capabilities
Key Features
- Autoresearch orchestration loop for codebases
- Benchmark discovery via /evo:discover command
- Parallel subagents running in isolated git worktrees
- Tree search over greedy hill climb
- Shared state across agents (failure traces, annotations, discarded hypotheses)
- Configurable frontier strategies: argmax, top-k, epsilon-greedy, softmax, pareto-per-task
- Gates system for pass/fail regression and safety checks
- Cross-cutting scan subagents for compound failure pattern detection
- Local and remote sandbox backends (Modal, E2B, Daytona, AWS, Azure, SSH)
- Web dashboard for monitoring experiments
- Plugin compatibility with Claude Code, Codex, Cursor, OpenClaw, Hermes, Opencode, Pi
- evo update command for CLI and host plugin version management
