ART (Agent Reinforcement Trainer)

Name: ART (Agent Reinforcement Trainer)
Availability: OnlineOnly
Author: OpenPipe

An open-source Python framework by OpenPipe for training multi-step LLM agents using GRPO reinforcement learning, enabling agents to learn from experience without labeled datasets.

Visit Website

At a Glance

Pricing

Open Source

Fully open-source under Apache 2.0 license, free to use, modify, and distribute.

Engagement

Available On

CLI

API

SDK

OpenPipeSeattle, WAEst. 2023$6.7M raised

Listed Jun 2026

About ART (Agent Reinforcement Trainer)

ART (Agent Reinforcement Trainer) is an open-source Python framework built by OpenPipe for applying reinforcement learning to LLM-based agents. Released under the Apache 2.0 license and available on GitHub, it wraps GRPO (Group Relative Policy Optimization) into an ergonomic harness that integrates with existing Python applications. The project has accumulated over 10,000 GitHub stars since its creation in March 2025.

What It Is

ART is a reinforcement learning training framework specifically designed for agentic LLMs — models that execute multi-step workflows, use tools, and interact with environments. Rather than requiring labeled training datasets or complex reward engineering, ART lets developers define scenarios and reward signals, then trains models to improve through repeated experience. The framework separates concerns into a lightweight ART Client (runs on any Python machine, including a laptop) and an ART Server (runs on a GPU, locally or in the cloud), so developers can iterate without managing GPU infrastructure directly.

How the Training Loop Works

ART's training loop follows a structured inference-then-training cycle:

Inference phase — The ART client drives an agentic workflow, executing multiple rollouts in parallel. Completion requests route to the ART server, which runs the model's latest LoRA adapter via vLLM. Each system, user, and assistant message is stored in a Trajectory.
Reward assignment — When a rollout finishes, the developer's code assigns a scalar reward to the Trajectory reflecting agent performance.
Training phase — Trajectories are grouped and sent to the server. The server trains the model using GRPO, initializing from the latest checkpoint or an empty LoRA on the first iteration, then saves and reloads the updated LoRA into vLLM.
Loop — Inference resumes and the cycle repeats for a configured number of iterations.

This design means the developer's application code never directly touches the training infrastructure.

Architecture and Deployment Model

ART supports both local GPU training and serverless cloud training via W&B Training (a managed service). The README describes W&B Training as providing multiplexed inference on a shared production-grade cluster, with the project claiming 40% lower cost and 28% faster training compared to self-managed setups — these are vendor-published figures. Every trained checkpoint is immediately available for inference via W&B Inference. For local use, ART can run on any machine with a compatible GPU, and free Google Colab notebooks are provided for getting started without any local GPU.

Integrations and Ecosystem

ART ships with integrations for several platforms and frameworks:

LangGraph — Train LangGraph agents directly with RL for multi-step reasoning and tool use
MCP (Model Context Protocol) — MCP•RL feature trains models to master any MCP server's tools automatically
W&B (Weights & Biases) — Observability, experiment tracking, and serverless training backend
Langfuse — Observability and debugging
OpenPipe — Model management and inference
Unsloth, vLLM, trl, torchtune — Underlying training and inference engines

Supported models include most vLLM/HuggingFace-transformers compatible causal language models supported by Unsloth, including Qwen3, Qwen2.5, Llama, and others. The docs note Gemma 3 is not currently supported.

Notable Capabilities and Features

RULER — Automatic reward generation for RL, removing the need to hand-craft reward functions
AutoRL — Zero-data training for arbitrary tasks using automatic input generation and RULER evaluation
SFT Training — Supervised fine-tuning support alongside RL, including SFT warmup before RL
Checkpoint Forking and Deletion — Manage training checkpoints flexibly
Additional Histories — Inject extra context into training trajectories
Tracking Metrics — Custom metric logging during training

The ART•E blog post describes training a Qwen 2.5 14B email research agent that the OpenPipe team reports outperforms OpenAI's o3 on email retrieval benchmarks — this is a vendor-published claim.

Update: v0.5.17

The latest release is v0.5.17, published in March 2026. The repository shows active development with recent pushes as of June 2026, 121 open issues, and 907 forks. Recent additions highlighted in the README include LangGraph integration, MCP•RL, AutoRL with zero-data training, and the RULER automatic reward system — indicating a rapid feature expansion trajectory since the initial March 2025 launch.

Community Discussions

Be the first to start a conversation about ART (Agent Reinforcement Trainer)

Share your experience with ART (Agent Reinforcement Trainer), ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully open-source under Apache 2.0 license, free to use, modify, and distribute.

GRPO reinforcement learning
SFT training
RULER automatic rewards
LangGraph integration
MCP•RL

Capabilities

Key Features

GRPO-based reinforcement learning for LLM agents
Multi-step agentic training loop with trajectory collection
RULER automatic reward generation
AutoRL zero-data training for any task
SFT (Supervised Fine-Tuning) support
Checkpoint forking and deletion
LangGraph integration
MCP•RL for training models on MCP servers
W&B serverless training backend
Local GPU and cloud GPU support
LoRA adapter training via Unsloth and vLLM
Parallel rollout execution
Custom metric tracking
OpenAI-compatible client interface
Google Colab notebooks for quick start

Integrations

Weights & Biases (W&B)

LangGraph

Langfuse

OpenPipe

vLLM

Unsloth

trl (HuggingFace)

torchtune

MCP (Model Context Protocol)

OpenEnv

Google Colab

API Available

View Docs

Back to all tools Suggest an edit