# Page Agent

> Page Agent is an open-source browser automation framework by Alibaba that enables AI agents to interact with web pages using natural language instructions.

Page Agent is an open-source browser automation framework from Alibaba that lets AI agents understand and interact with web pages through natural language. But unlike tools like Browser-Use that control the entire browser from the outside, Page Agent is designed as an embedded component that lives inside your website. You drop it into your app, and your users can talk to the page directly.

It takes a DOM-first approach rather than relying on visual recognition. Page Agent uses high-intensity DOM dehydration (stripping the DOM down to its essential structure) and pure text processing to understand page layouts. This makes it faster and more precise than screenshot-based alternatives. It then automates tasks like clicking, form filling, navigation, and data extraction without requiring custom scripts or selectors.

They also offer a Chrome Extension you can install to manually run Page Agent on any website and ask it to do tasks. For example, on this very page you could ask the Page Agent to tell you about its own features.

**Current version:** 1.5.2

- **Natural Language Control**: Describe web tasks in plain language and let the agent figure out the steps to complete them on any web page. This also doubles as an accessibility layer, giving visually impaired and elderly users a natural language interface that works with screen readers and voice assistants.
- **DOM-Based Intelligence**: Instead of using vision models to read screenshots, Page Agent analyzes the DOM directly through text processing. This means faster execution and more precise element targeting, especially on complex B2B systems and admin panels.
- **Secure & Controllable**: Supports operation allowlists so you can restrict what the agent can do, data masking to protect sensitive fields, and custom knowledge injection to enforce AI rule compliance within your app.
- **Zero Backend / Easy Integration**: Import via CDN or NPM with no backend infrastructure required. Works with your own LLM endpoints, so you control the model and the data flow.
- **Browser Automation**: Automates clicks, form fills, navigation, and data extraction across web pages without requiring custom selectors or scripts.
- **Open Source**: Freely available on GitHub under Alibaba's organization, allowing developers to inspect, extend, and contribute to the codebase.
- **Agent Framework Support**: Designed to integrate with agent orchestration frameworks, making it suitable for building multi-step autonomous web workflows.
- **Developer SDK**: Provides a programmatic API for embedding web automation capabilities into custom AI agent pipelines and applications.

**Common use cases** include connecting support bots so they can operate directly on a page for users, modernizing legacy apps with a single line of code, building interactive training that demonstrates real workflows, and making complex software accessible through natural language.

## Features
- Natural language browser control
- Vision-language model page understanding
- Automated web interaction (clicks, forms, navigation)
- Multi-step task execution
- Data extraction from web pages
- Open-source and extensible
- Agent framework integration
- Programmatic API/SDK

## Integrations
Large language models, Vision-language models, Browser automation tools

## Platforms
WEB, API, DEVELOPER_SDK

## Pricing
Open Source

## Links
- Website: https://alibaba.github.io/page-agent/
- Documentation: https://alibaba.github.io/page-agent/docs/introduction/overview
- Repository: https://github.com/alibaba/page-agent
- EveryDev.ai: https://www.everydev.ai/tools/page-agent