Agent Desktop
A native desktop automation CLI for AI agents built in Rust that controls any application through OS accessibility trees with structured JSON output and deterministic element refs.
At a Glance
Fully free and open-source under the Apache License 2.0. Use, modify, and distribute freely.
Engagement
Available On
Alternatives
Listed May 2026
About Agent Desktop
agent-desktop is a native desktop automation CLI built in Rust, designed specifically for AI agents to observe, decide, and act on any desktop application. It provides structured access to applications through OS accessibility trees — no screenshots, no pixel matching, no browser required. The tool outputs machine-readable JSON with deterministic element references, making it ideal for agentic workflows that require reliable, repeatable UI interactions.
- Native Rust CLI: Fast, single binary with no runtime dependencies — install via
npm install -g agent-desktopor build from source with Cargo. - 53 commands: Covers observation, interaction, keyboard, mouse, notifications, clipboard, and window management for comprehensive desktop control.
- Progressive skeleton traversal: Achieves 78–96% token reduction on dense apps via shallow overview and targeted drill-down, minimizing LLM context usage.
- Snapshot & refs system: AI-optimized workflow using deterministic element references (
@e1,@e2) that persist until the next snapshot, enabling reliable act-verify loops. - AX-first interactions: Every action exhausts pure accessibility API strategies before falling back to mouse events, maximizing reliability.
- Structured JSON output: All commands return machine-readable responses with error codes and recovery hints for robust agent error handling.
- C-ABI cdylib (FFI): Load
libagent_desktop_ffionce from Python, Swift, Go, Ruby, Node, or C instead of forking the CLI per call — prebuilt binaries ship with every release. - Works with any app: Finder, Safari, System Settings, Xcode, Slack — anything with an OS accessibility tree is supported.
- Batch command execution: Run multiple commands in a single call with
--stop-on-errorsupport for efficient multi-step agent workflows. - Cross-platform FFI binaries: Prebuilt cdylib artifacts available for macOS arm64/x86_64, Linux x86_64/arm64, and Windows x86_64.
Community Discussions
Be the first to start a conversation about Agent Desktop
Share your experience with Agent Desktop, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully free and open-source under the Apache License 2.0. Use, modify, and distribute freely.
- All 53 commands
- Native Rust CLI binary
- C-ABI cdylib FFI library
- Progressive skeleton traversal
- Structured JSON output
Capabilities
Key Features
- Native Rust CLI — single binary, no runtime dependencies
- 53 commands covering observation, interaction, keyboard, mouse, notifications, clipboard, and window management
- Progressive skeleton traversal with 78–96% token reduction
- Deterministic element refs (@e1, @e2) via snapshot system
- AX-first interaction strategy before mouse fallback
- Structured JSON output with error codes and recovery hints
- C-ABI cdylib (libagent_desktop_ffi) for in-process FFI from Python, Swift, Go, Ruby, Node, C
- Batch command execution with stop-on-error support
- Accessibility tree traversal — no screenshots or pixel matching
- App and window management (launch, close, resize, move, minimize, maximize)
- Clipboard read/write/clear
- Notification listing and dismissal (macOS)
- Wait commands with element, window, text, and menu conditions
- Prebuilt binaries for macOS arm64/x86_64, Linux x86_64/arm64, Windows x86_64
