# UFO

> UFO³ is an open-source multi-device GUI agent framework by Microsoft that orchestrates intelligent agents across Windows, Linux, and Android using DAG-based task planning.

UFO³ is a Microsoft Research open-source project that has evolved from a single Windows GUI agent into a full multi-device orchestration framework. Released under the MIT License, it coordinates intelligent agents across heterogeneous platforms—Windows, Linux, and Android—using a declarative DAG-based task model. The project has accumulated over 9,000 GitHub stars since its initial release in February 2024.

## What It Is

UFO³ is a GUI automation agent framework that lets users describe complex tasks in natural language and have them executed automatically across one or more devices. It ships in two tightly integrated components: **UFO²**, a stable Desktop AgentOS for single-device Windows automation, and **Galaxy**, a newer multi-device orchestration layer that decomposes requests into executable directed acyclic graphs (DAGs) and dispatches subtasks to capable device agents in parallel. Both components are written in Python and require an LLM API key (OpenAI, Azure OpenAI, Qwen, Gemini, Claude, and others are supported).

## Architecture: UFO² and Galaxy

The project's README describes a two-tier architecture:

- **UFO² (Desktop AgentOS)** — the stable, long-term-support layer. It integrates deeply with Windows via UIA, Win32, and WinCOM APIs, supports hybrid GUI-click plus API-call actions, and uses speculative multi-action batching that the documentation claims reduces LLM calls by 51%. UFO² can run standalone or serve as a Galaxy device agent for Windows.
- **Galaxy (Multi-Device Orchestration)** — the newer active-development layer. A `ConstellationAgent` decomposes user requests into a `TaskConstellation` DAG of `TaskStar` nodes with dependencies. A `TaskOrchestrator` schedules and executes nodes asynchronously, matching tasks to devices by capability. Agents communicate over a WebSocket-based Unified Agent Interaction Protocol (AIP) with fault tolerance and automatic reconnection.

## Key Capabilities

- **Declarative DAG decomposition** — requests become structured graphs with explicit dependencies, enabling automated scheduling and runtime rewriting
- **Dynamic graph evolution** — the constellation adapts to execution feedback through controlled rewrites rather than rigid pre-planned sequences
- **Heterogeneous async orchestration** — capability-based device matching with safe locking and formally verified concurrency correctness
- **MCP integration** — Model Context Protocol support for tool augmentation in device agents
- **RAG knowledge substrate** — retrieval-augmented generation over documentation, demos, and execution traces for UFO²
- **Visual + UIA hybrid detection** — combines screenshot-based and accessibility-tree-based control detection for robustness

## Setup Path

Both frameworks are installed via `pip install -r requirements.txt` from the GitHub repository. Configuration requires editing YAML files to supply LLM API keys and, for Galaxy, registering device agents in a `devices.yaml` pool. The README provides separate quick-start paths for Galaxy (cross-device) and UFO² (Windows-only), with platform-specific guides for Windows, Linux, and Android device agents.

## Update: UFO³ Galaxy and Version 3.0.7

The project's evolution timeline spans three generations: the original UFO GUI agent (February 2024), UFO² Desktop AgentOS (April 2025, now in LTS), and UFO³ Galaxy (November 2025). The latest GitHub release is **version 3.0.7**, published June 12, 2026. UFO² has entered Long-Term Support status with ongoing bug fixes and security updates. Galaxy is marked as active development, recommended for experimentation and non-critical workflows. Two research papers accompany the releases: arXiv:2504.14603 for UFO² and arXiv:2511.11332 for UFO³ Galaxy.

## Why It Got Attention

The original UFO release in February 2024 received wide media coverage for applying multimodal LLMs directly to Windows GUI automation. UFO² extended this into an "AgentOS" concept with deeper OS integration. UFO³ Galaxy represents a research-level step toward coordinating fleets of heterogeneous device agents—the README describes it as the first multi-device orchestration framework for GUI agents—positioning it within the broader multi-agent systems research landscape alongside related Microsoft projects like TaskWeaver.

## Features
- Multi-device DAG-based task orchestration
- UFO² Desktop AgentOS for Windows automation
- Galaxy multi-device orchestration framework
- Declarative task decomposition into TaskConstellation DAGs
- Dynamic graph evolution with runtime rewriting
- Asynchronous parallel task execution
- Capability-based device matching and assignment
- Unified Agent Interaction Protocol (AIP) over WebSocket
- Model Context Protocol (MCP) integration
- Hybrid GUI + API action execution
- Speculative multi-action batching
- Visual + UIA hybrid control detection
- RAG knowledge substrate with docs and execution traces
- Support for OpenAI, Azure OpenAI, Qwen, Gemini, Claude
- Windows UIA, Win32, WinCOM native integration
- Android and Linux device agent support
- Real-time status monitoring and visualization
- Fault tolerance and automatic reconnection

## Integrations
OpenAI GPT-4o, Azure OpenAI, Qwen, Gemini, Claude, Model Context Protocol (MCP), Windows UIA, Win32, WinCOM, Android ADB, Linux shell

## Platforms
WINDOWS, MACOS, LINUX, ANDROID, API, CLI

## Pricing
Open Source

## Version
3.0.7

## Links
- Website: https://github.com/microsoft/UFO
- Documentation: https://microsoft.github.io/UFO/
- Repository: https://github.com/microsoft/UFO
- EveryDev.ai: https://www.everydev.ai/tools/ufo-microsoft
