Atropos

Name: Atropos
Availability: OnlineOnly
Author: Nous Research

An async-first environment microservice framework for reinforcement learning with LLMs, enabling scalable collection and evaluation of LLM trajectories across diverse environments.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under the MIT License. Free to use, modify, and distribute.

Engagement

Available On

Windows

Web

API

SDK

CLI

Nous ResearchNew York, NYEst. 2023$55200000 raised

Listed Apr 2026

About Atropos

Atropos is Nous Research's open-source LLM Reinforcement Learning Gym — an environment microservice framework for async RL with large language models. It provides a flexible, scalable, and standardized platform to accelerate LLM-based RL research across diverse, interactive settings. The framework supports collecting, distributing, and evaluating LLM trajectories through dataset environments, online game environments, RLAIF/RLHF pipelines, multi-turn RL, code execution, and multimodal tasks.

Environment Microservice Architecture — Each environment runs as an independent service, sending trajectory data to a central API that trainers pull batches from, enabling fully async and distributed RL loops.
Diverse Environment Support — Includes dataset environments (GSM8K, MMLU), interactive games (Blackjack, Taxi), RLAIF/RLHF pipelines, multi-turn tool calling, code execution (MBPP, HumanEval), and multimodal tasks (OCR VQA, CLEVR).
OpenAI-Compatible API Integration — Works with any OpenAI-compatible inference endpoint including vLLM, SGLang, OpenAI, Together AI, and OpenRouter; no GPU required for local environment development.
Trainer Integrations — Native integrations with Axolotl (via plugin) and Tinker for LoRA/QLoRA fine-tuning, plus an included example trainer for reference implementations.
On-Policy Distillation (OPD) Support — Carries distillation arrays through ScoredDataGroup and API endpoints, enabling teacher-student distillation workflows with TeacherDistillationEnv.
Offline Data Generation — Use atropos-sft-gen and atropos-dpo-gen CLI tools to collect rollouts and convert them into SFT or DPO training datasets with rejection sampling controls.
Debugging & Visualization Tools — The process subcommand runs inference-only rollouts with JSONL output, auto-generated HTML visualizations, and optional Weights & Biases logging; view-run launches a Gradio UI for batch inspection.
Easy Installation — Install via pip install atroposlib or clone the repo and use pip install -e .[all] for full development setup with Python 3.10+.
Proven Results — Demonstrated 4.6x improvement on parallel tool-calling tasks and 2.5x improvement on financial fundamentals prediction using Atropos-trained models.
Community Environments — A environments/community/ directory and contribution guide make it easy to add and share new RL environments with the broader research community.

Community Discussions

Be the first to start a conversation about Atropos

Share your experience with Atropos, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source (MIT)

Fully free and open-source under the MIT License. Free to use, modify, and distribute.

Full framework source code
All built-in environments
Trainer integrations (Axolotl, Tinker)
CLI tools (atropos-sft-gen, atropos-dpo-gen)
Community environments

Capabilities

Key Features

Async-first environment microservice framework
Trajectory API for collecting and distributing LLM rollouts
Dataset environments (GSM8K, MMLU, custom HuggingFace datasets)
Online game environments (Blackjack, Taxi, text-based games)
RLAIF and RLHF support
Multi-turn RL for complex multi-step interactions
Code execution environments (MBPP, HumanEval)
Multimodal environments (OCR VQA, CLEVR)
OpenAI-compatible API endpoint support
vLLM and SGLang native server integrations
Axolotl trainer plugin integration
Tinker LoRA trainer integration
On-Policy Distillation (OPD) support
TeacherDistillationEnv for teacher-student distillation
atropos-sft-gen and atropos-dpo-gen CLI tools
process subcommand for inference-only rollouts
JSONL output and HTML visualization
Weights & Biases logging
Gradio UI via view-run
Slurm support for distributed inference
Pre-commit hooks and contribution guide
MIT License

Integrations

vLLM

SGLang

OpenAI API

Together AI

OpenRouter

Axolotl

Tinker

Weights & Biases

HuggingFace

Slurm

Gradio

API Available

View Docs

Back to all tools Suggest an edit

About Atropos

Environment Microservice Architecture — Each environment runs as an independent service, sending trajectory data to a central API that trainers pull batches from, enabling fully async and distributed RL loops.
Diverse Environment Support — Includes dataset environments (GSM8K, MMLU), interactive games (Blackjack, Taxi), RLAIF/RLHF pipelines, multi-turn tool calling, code execution (MBPP, HumanEval), and multimodal tasks (OCR VQA, CLEVR).
OpenAI-Compatible API Integration — Works with any OpenAI-compatible inference endpoint including vLLM, SGLang, OpenAI, Together AI, and OpenRouter; no GPU required for local environment development.
Trainer Integrations — Native integrations with Axolotl (via plugin) and Tinker for LoRA/QLoRA fine-tuning, plus an included example trainer for reference implementations.
On-Policy Distillation (OPD) Support — Carries distillation arrays through ScoredDataGroup and API endpoints, enabling teacher-student distillation workflows with TeacherDistillationEnv.
Offline Data Generation — Use atropos-sft-gen and atropos-dpo-gen CLI tools to collect rollouts and convert them into SFT or DPO training datasets with rejection sampling controls.
Debugging & Visualization Tools — The process subcommand runs inference-only rollouts with JSONL output, auto-generated HTML visualizations, and optional Weights & Biases logging; view-run launches a Gradio UI for batch inspection.
Easy Installation — Install via pip install atroposlib or clone the repo and use pip install -e .[all] for full development setup with Python 3.10+.
Proven Results — Demonstrated 4.6x improvement on parallel tool-calling tasks and 2.5x improvement on financial fundamentals prediction using Atropos-trained models.
Community Environments — A environments/community/ directory and contribution guide make it easy to add and share new RL environments with the broader research community.

Atropos