# ART (Agent Reinforcement Trainer)

> An open-source Python framework by OpenPipe for training multi-step LLM agents using GRPO reinforcement learning, enabling agents to learn from experience without labeled datasets.

ART (Agent Reinforcement Trainer) is an open-source Python framework built by OpenPipe for applying reinforcement learning to LLM-based agents. Released under the Apache 2.0 license and available on GitHub, it wraps GRPO (Group Relative Policy Optimization) into an ergonomic harness that integrates with existing Python applications. The project has accumulated over 10,000 GitHub stars since its creation in March 2025.

## What It Is

ART is a reinforcement learning training framework specifically designed for agentic LLMs — models that execute multi-step workflows, use tools, and interact with environments. Rather than requiring labeled training datasets or complex reward engineering, ART lets developers define scenarios and reward signals, then trains models to improve through repeated experience. The framework separates concerns into a lightweight **ART Client** (runs on any Python machine, including a laptop) and an **ART Server** (runs on a GPU, locally or in the cloud), so developers can iterate without managing GPU infrastructure directly.

## How the Training Loop Works

ART's training loop follows a structured inference-then-training cycle:

1. **Inference phase** — The ART client drives an agentic workflow, executing multiple rollouts in parallel. Completion requests route to the ART server, which runs the model's latest LoRA adapter via vLLM. Each system, user, and assistant message is stored in a Trajectory.
2. **Reward assignment** — When a rollout finishes, the developer's code assigns a scalar reward to the Trajectory reflecting agent performance.
3. **Training phase** — Trajectories are grouped and sent to the server. The server trains the model using GRPO, initializing from the latest checkpoint or an empty LoRA on the first iteration, then saves and reloads the updated LoRA into vLLM.
4. **Loop** — Inference resumes and the cycle repeats for a configured number of iterations.

This design means the developer's application code never directly touches the training infrastructure.

## Architecture and Deployment Model

ART supports both local GPU training and serverless cloud training via W&B Training (a managed service). The README describes W&B Training as providing multiplexed inference on a shared production-grade cluster, with the project claiming 40% lower cost and 28% faster training compared to self-managed setups — these are vendor-published figures. Every trained checkpoint is immediately available for inference via W&B Inference. For local use, ART can run on any machine with a compatible GPU, and free Google Colab notebooks are provided for getting started without any local GPU.

## Integrations and Ecosystem

ART ships with integrations for several platforms and frameworks:

- **LangGraph** — Train LangGraph agents directly with RL for multi-step reasoning and tool use
- **MCP (Model Context Protocol)** — MCP•RL feature trains models to master any MCP server's tools automatically
- **W&B (Weights & Biases)** — Observability, experiment tracking, and serverless training backend
- **Langfuse** — Observability and debugging
- **OpenPipe** — Model management and inference
- **Unsloth, vLLM, trl, torchtune** — Underlying training and inference engines

Supported models include most vLLM/HuggingFace-transformers compatible causal language models supported by Unsloth, including Qwen3, Qwen2.5, Llama, and others. The docs note Gemma 3 is not currently supported.

## Notable Capabilities and Features

- **RULER** — Automatic reward generation for RL, removing the need to hand-craft reward functions
- **AutoRL** — Zero-data training for arbitrary tasks using automatic input generation and RULER evaluation
- **SFT Training** — Supervised fine-tuning support alongside RL, including SFT warmup before RL
- **Checkpoint Forking and Deletion** — Manage training checkpoints flexibly
- **Additional Histories** — Inject extra context into training trajectories
- **Tracking Metrics** — Custom metric logging during training

The ART•E blog post describes training a Qwen 2.5 14B email research agent that the OpenPipe team reports outperforms OpenAI's o3 on email retrieval benchmarks — this is a vendor-published claim.

## Update: v0.5.17

The latest release is v0.5.17, published in March 2026. The repository shows active development with recent pushes as of June 2026, 121 open issues, and 907 forks. Recent additions highlighted in the README include LangGraph integration, MCP•RL, AutoRL with zero-data training, and the RULER automatic reward system — indicating a rapid feature expansion trajectory since the initial March 2025 launch.

## Features
- GRPO-based reinforcement learning for LLM agents
- Multi-step agentic training loop with trajectory collection
- RULER automatic reward generation
- AutoRL zero-data training for any task
- SFT (Supervised Fine-Tuning) support
- Checkpoint forking and deletion
- LangGraph integration
- MCP•RL for training models on MCP servers
- W&B serverless training backend
- Local GPU and cloud GPU support
- LoRA adapter training via Unsloth and vLLM
- Parallel rollout execution
- Custom metric tracking
- OpenAI-compatible client interface
- Google Colab notebooks for quick start

## Integrations
Weights & Biases (W&B), LangGraph, Langfuse, OpenPipe, vLLM, Unsloth, trl (HuggingFace), torchtune, MCP (Model Context Protocol), OpenEnv, Google Colab

## Platforms
CLI, API, DEVELOPER_SDK

## Pricing
Open Source

## Version
v0.5.17

## Links
- Website: https://art.openpipe.ai
- Documentation: https://art.openpipe.ai/getting-started/about
- Repository: https://github.com/OpenPipe/ART
- EveryDev.ai: https://www.everydev.ai/tools/art-agent-reinforcement-trainer
