# Agent Desktop

> A native desktop automation CLI for AI agents built in Rust that controls any application through OS accessibility trees with structured JSON output and deterministic element refs.

**agent-desktop** is a native desktop automation CLI built in Rust, designed specifically for AI agents to observe, decide, and act on any desktop application. It provides structured access to applications through OS accessibility trees — no screenshots, no pixel matching, no browser required. The tool outputs machine-readable JSON with deterministic element references, making it ideal for agentic workflows that require reliable, repeatable UI interactions.

- **Native Rust CLI**: *Fast, single binary with no runtime dependencies — install via `npm install -g agent-desktop` or build from source with Cargo.*
- **53 commands**: *Covers observation, interaction, keyboard, mouse, notifications, clipboard, and window management for comprehensive desktop control.*
- **Progressive skeleton traversal**: *Achieves 78–96% token reduction on dense apps via shallow overview and targeted drill-down, minimizing LLM context usage.*
- **Snapshot & refs system**: *AI-optimized workflow using deterministic element references (`@e1`, `@e2`) that persist until the next snapshot, enabling reliable act-verify loops.*
- **AX-first interactions**: *Every action exhausts pure accessibility API strategies before falling back to mouse events, maximizing reliability.*
- **Structured JSON output**: *All commands return machine-readable responses with error codes and recovery hints for robust agent error handling.*
- **C-ABI cdylib (FFI)**: *Load `libagent_desktop_ffi` once from Python, Swift, Go, Ruby, Node, or C instead of forking the CLI per call — prebuilt binaries ship with every release.*
- **Works with any app**: *Finder, Safari, System Settings, Xcode, Slack — anything with an OS accessibility tree is supported.*
- **Batch command execution**: *Run multiple commands in a single call with `--stop-on-error` support for efficient multi-step agent workflows.*
- **Cross-platform FFI binaries**: *Prebuilt cdylib artifacts available for macOS arm64/x86_64, Linux x86_64/arm64, and Windows x86_64.*

## Features
- Native Rust CLI — single binary, no runtime dependencies
- 53 commands covering observation, interaction, keyboard, mouse, notifications, clipboard, and window management
- Progressive skeleton traversal with 78–96% token reduction
- Deterministic element refs (@e1, @e2) via snapshot system
- AX-first interaction strategy before mouse fallback
- Structured JSON output with error codes and recovery hints
- C-ABI cdylib (libagent_desktop_ffi) for in-process FFI from Python, Swift, Go, Ruby, Node, C
- Batch command execution with stop-on-error support
- Accessibility tree traversal — no screenshots or pixel matching
- App and window management (launch, close, resize, move, minimize, maximize)
- Clipboard read/write/clear
- Notification listing and dismissal (macOS)
- Wait commands with element, window, text, and menu conditions
- Prebuilt binaries for macOS arm64/x86_64, Linux x86_64/arm64, Windows x86_64

## Integrations
Python (via ctypes/FFI), Swift, Go, Ruby, Node.js, C/C++, npm, Cargo, MCP (Model Context Protocol), Slack, Safari, Finder, Xcode, VS Code, Notion

## Platforms
WINDOWS, MACOS, LINUX, API, VSC_EXTENSION, DEVELOPER_SDK, CLI

## Pricing
Open Source

## Version
v0.1.13

## Links
- Website: https://github.com/lahfir/agent-desktop
- Documentation: https://github.com/lahfir/agent-desktop
- Repository: https://github.com/lahfir/agent-desktop
- EveryDev.ai: https://www.everydev.ai/tools/agent-desktop
