BrowserGym
An open-source Gym environment for web task automation, enabling researchers to build, test, and benchmark web agents across multiple standardized benchmarks.
At a Glance
About BrowserGym
BrowserGym is an open-source framework developed by ServiceNow Research that provides a standardized Gym-compatible environment for web task automation and web agent research. Built on top of Playwright and the Gymnasium interface, it lets researchers implement agents that interact with real browsers and evaluate them across a growing suite of benchmarks. The project is published under the Apache License 2.0 and is explicitly positioned as a research tool rather than a consumer product.
What It Is
BrowserGym wraps browser interactions into the familiar gym.make / env.step loop from reinforcement learning, making it straightforward to plug in any LLM-based or rule-based agent. Each task exposes observations (DOM, screenshots, accessibility trees) and accepts actions (clicks, typing, navigation), with rewards computed by benchmark-specific evaluators. The framework is designed to be extensible: new benchmarks can be added by subclassing AbstractBrowserTask.
Included Benchmarks
BrowserGym ships with integrations for a wide range of web agent benchmarks out of the box:
- MiniWoB – over 100 synthetic web tasks via the Farama Foundation
- WebArena and WebArenaVerified – realistic tasks on self-hosted web domains
- VisualWebArena – visual variants of WebArena tasks
- WorkArena / WorkArena++ – tasks on the ServiceNow platform
- AssistantBench – time-consuming open-web research tasks
- WebLINX – a static dataset of real-world web interaction traces
- OpenApps – Facebook Research's open application benchmark
- TimeWarp – a temporal web task benchmark
Architecture and Setup Path
Installation is modular via PyPI. The full stack installs with pip install browsergym, while individual benchmark packages (e.g., browsergym-webarena, browsergym-miniwob) can be installed separately to keep dependencies lean. After installation, Playwright's Chromium browser is set up with playwright install chromium. Each benchmark then has its own additional setup steps documented in per-benchmark READMEs.
A companion framework, AgentLab, is maintained alongside BrowserGym and provides higher-level utilities for running agents at scale, collecting traces, and analyzing results across all BrowserGym benchmarks.
Research Lineage and Publication
The framework is described in a peer-reviewed paper — "The BrowserGym Ecosystem for Web Agent Research" — published in Transactions on Machine Learning Research (2025) with Expert Certification. The WorkArena benchmark was presented at ICML 2024. Experiment traces from the paper are publicly available on Hugging Face. The project has accumulated over 1,200 GitHub stars and 177 forks as of mid-2025, according to the repository metadata.
Update: v0.14.3
The latest release is v0.14.3, published on January 20, 2026. The repository remains actively maintained, with the last push recorded in March 2026. Recent additions to the benchmark suite include OpenApps and TimeWarp, signaling continued expansion of the supported task environments. The project's CI pipeline enforces code formatting and unit tests on every push.
Community Discussions
Be the first to start a conversation about BrowserGym
Share your experience with BrowserGym, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully free and open-source under the Apache 2.0 license. Install via pip and use all benchmarks at no cost.
- Full framework access
- All benchmark integrations
- Extensible task API
- AgentLab compatibility
- Apache 2.0 license
Capabilities
Key Features
- Gym-compatible browser environment for web agents
- Support for MiniWoB, WebArena, VisualWebArena, WorkArena, AssistantBench, WebLINX, OpenApps, TimeWarp benchmarks
- Modular pip installation per benchmark
- Playwright-based Chromium browser automation
- Extensible AbstractBrowserTask base class for custom benchmarks
- DOM, screenshot, and accessibility tree observations
- Demo agent with OpenAI backend
- Integration with AgentLab for large-scale agent evaluation
- Open-ended interactive chat task mode
- Apache 2.0 open-source license
