WebArena

Name: WebArena
Availability: OnlineOnly
Author: web-arena-x

Agent Harness

A standalone, self-hostable web environment for building and evaluating autonomous web agents on realistic tasks.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under Apache License 2.0. Self-host the full benchmark environment.

Engagement

Available On

CLI

API

SDK

web-arena-xPittsburgh, PAEst. 2023

Listed May 2026

About WebArena

WebArena is an open-source benchmark environment for building and evaluating autonomous web agents, published as a research project by the web-arena-x organization. It provides a self-hostable suite of realistic websites — including a shopping site, Reddit clone, GitLab instance, map, and Wikipedia mirror — against which agents can be tested on 812 end-to-end tasks. The project was introduced in a paper presented at NeurIPS 2024 (Oral) and is available on GitHub under the Apache License 2.0.

What It Is

WebArena is a research-grade evaluation harness that simulates a realistic web browsing environment for autonomous agents. Rather than relying on live websites, it packages self-contained Docker-based web applications that agents can interact with through a Playwright-driven browser interface. The environment exposes observations as accessibility trees or HTML and accepts structured actions (clicks, typing, navigation), making it compatible with LLM-based agents that reason about web content.

Architecture and Setup

The environment is built on Python 3.10+ and uses Playwright for browser automation. Each test example is defined by a JSON config file, and the full benchmark consists of 812 such examples. Researchers spin up the included Docker images for each website, configure environment variables pointing to each service, and then run agents against the local stack. The repository includes:

Docker resources and an Amazon Machine Image (AMI) with all websites pre-installed
Auto-login cookie generation for all bundled websites
A ScriptBrowserEnv class with an OpenAI Gym-style API (reset, step)
Baseline prompt-based agents using Chain-of-Thought and ReAct-style reasoning

Benchmark Scope and Related Projects

The webarena.dev project page describes WebArena as part of a broader suite of autonomous web agent benchmarks:

WebArena — the original realistic web environment (NeurIPS 2024 Oral)
WebArena-Infinity — continuous and scalable evaluation in evolving environments
VisualWebArena — multimodal agents on visual web tasks (ACL 2024)
TheAgentCompany — LLM agents on consequential real-world tasks in a simulated company (ICML 2025)

The web navigation infrastructure has also been extended by AgentLab (ServiceNow), which adds parallel experiment support via BrowserGym, integration of multiple benchmarks, and a unified leaderboard.

Update: v0.2.0 and December 2024 Enhancements

The latest tagged release is v0.2.0 (October 2023), which stabilized the annotation dataset after a full re-examination and bug-fix pass. The repository notes that no major annotation updates are expected beyond this version. In December 2024, the maintainers highlighted that AgentLab now provides the recommended framework for running experiments, offering parallel execution, unified leaderboard reporting, and improved edge-case handling. A public leaderboard is maintained via Google Sheets, and human annotator trajectories for approximately 170 tasks were released in December 2023 for reference.

Who It Is For

WebArena targets AI researchers and practitioners building or evaluating LLM-based web agents. It is particularly suited for teams working on browser automation, agent reasoning, and multi-step task completion in realistic web contexts. The Gym-style Python API lowers the barrier for integrating new agent architectures, and the modular prompt constructor design makes it straightforward to swap in custom prompting strategies.

Community Discussions

Be the first to start a conversation about WebArena

Share your experience with WebArena, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source under Apache License 2.0. Self-host the full benchmark environment.

Apache License 2.0
Full source code on GitHub
812 evaluation tasks
Docker-based self-hosted websites
Python API

Capabilities

Key Features

Self-hostable web environment with Docker-based websites
812 end-to-end evaluation tasks
OpenAI Gym-style Python API (reset/step)
Accessibility tree and HTML observation spaces
Playwright-based browser automation
Auto-login cookie generation for bundled sites
Baseline Chain-of-Thought and ReAct agents
Amazon Machine Image with pre-installed websites
Public leaderboard via Google Sheets
Human annotator trajectory recordings
Zeno integration for result analysis
Modular prompt constructor design

Integrations

OpenAI GPT-3.5 / GPT-4

Playwright

BrowserGym

AgentLab (ServiceNow)

Zeno (zenoml.com)

Docker

Amazon Web Services (AMI)

API Available

View Docs

Back to all tools Suggest an edit

WebArena

Agent Harness

A standalone, self-hostable web environment for building and evaluating autonomous web agents on realistic tasks.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under Apache License 2.0. Self-host the full benchmark environment.

Engagement

ratings

discussions

10views

Available On

CLI

API

SDK

Resources

Website Docs GitHub llms.txt

Topics

Agent Harness Browser Automation LLM Evaluations

Alternatives

harness-kit ctx peerd

Developer

web-arena-xPittsburgh, PAEst. 2023

Listed May 2026

About WebArena

What It Is

Architecture and Setup

Docker resources and an Amazon Machine Image (AMI) with all websites pre-installed
Auto-login cookie generation for all bundled websites
A ScriptBrowserEnv class with an OpenAI Gym-style API (reset, step)
Baseline prompt-based agents using Chain-of-Thought and ReAct-style reasoning

Benchmark Scope and Related Projects

The webarena.dev project page describes WebArena as part of a broader suite of autonomous web agent benchmarks:

WebArena — the original realistic web environment (NeurIPS 2024 Oral)
WebArena-Infinity — continuous and scalable evaluation in evolving environments
VisualWebArena — multimodal agents on visual web tasks (ACL 2024)
TheAgentCompany — LLM agents on consequential real-world tasks in a simulated company (ICML 2025)

Update: v0.2.0 and December 2024 Enhancements

Who It Is For

Community Discussions

Be the first to start a conversation about WebArena

Share your experience with WebArena, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source under Apache License 2.0. Self-host the full benchmark environment.

Apache License 2.0
Full source code on GitHub
812 evaluation tasks
Docker-based self-hosted websites
Python API

Capabilities

Key Features

Self-hostable web environment with Docker-based websites
812 end-to-end evaluation tasks
OpenAI Gym-style Python API (reset/step)
Accessibility tree and HTML observation spaces
Playwright-based browser automation
Auto-login cookie generation for bundled sites
Baseline Chain-of-Thought and ReAct agents
Amazon Machine Image with pre-installed websites
Public leaderboard via Google Sheets
Human annotator trajectory recordings
Zeno integration for result analysis
Modular prompt constructor design

Integrations

OpenAI GPT-3.5 / GPT-4

Playwright

BrowserGym

AgentLab (ServiceNow)

Zeno (zenoml.com)

Docker

Amazon Web Services (AMI)

API Available

View Docs

Back to all tools Suggest an edit