Atropos
An async-first environment microservice framework for reinforcement learning with LLMs, enabling scalable collection and evaluation of LLM trajectories across diverse environments.
At a Glance
About Atropos
Atropos is Nous Research's open-source LLM Reinforcement Learning Gym — an environment microservice framework for async RL with large language models. It provides a flexible, scalable, and standardized platform to accelerate LLM-based RL research across diverse, interactive settings. The framework supports collecting, distributing, and evaluating LLM trajectories through dataset environments, online game environments, RLAIF/RLHF pipelines, multi-turn RL, code execution, and multimodal tasks.
- Environment Microservice Architecture — Each environment runs as an independent service, sending trajectory data to a central API that trainers pull batches from, enabling fully async and distributed RL loops.
- Diverse Environment Support — Includes dataset environments (GSM8K, MMLU), interactive games (Blackjack, Taxi), RLAIF/RLHF pipelines, multi-turn tool calling, code execution (MBPP, HumanEval), and multimodal tasks (OCR VQA, CLEVR).
- OpenAI-Compatible API Integration — Works with any OpenAI-compatible inference endpoint including vLLM, SGLang, OpenAI, Together AI, and OpenRouter; no GPU required for local environment development.
- Trainer Integrations — Native integrations with Axolotl (via plugin) and Tinker for LoRA/QLoRA fine-tuning, plus an included example trainer for reference implementations.
- On-Policy Distillation (OPD) Support — Carries distillation arrays through
ScoredDataGroupand API endpoints, enabling teacher-student distillation workflows withTeacherDistillationEnv. - Offline Data Generation — Use
atropos-sft-genandatropos-dpo-genCLI tools to collect rollouts and convert them into SFT or DPO training datasets with rejection sampling controls. - Debugging & Visualization Tools — The
processsubcommand runs inference-only rollouts with JSONL output, auto-generated HTML visualizations, and optional Weights & Biases logging;view-runlaunches a Gradio UI for batch inspection. - Easy Installation — Install via
pip install atroposlibor clone the repo and usepip install -e .[all]for full development setup with Python 3.10+. - Proven Results — Demonstrated 4.6x improvement on parallel tool-calling tasks and 2.5x improvement on financial fundamentals prediction using Atropos-trained models.
- Community Environments — A
environments/community/directory and contribution guide make it easy to add and share new RL environments with the broader research community.
Community Discussions
Be the first to start a conversation about Atropos
Share your experience with Atropos, ask questions, or help others learn from your insights.
Pricing
Open Source (MIT)
Fully free and open-source under the MIT License. Free to use, modify, and distribute.
- Full framework source code
- All built-in environments
- Trainer integrations (Axolotl, Tinker)
- CLI tools (atropos-sft-gen, atropos-dpo-gen)
- Community environments
Capabilities
Key Features
- Async-first environment microservice framework
- Trajectory API for collecting and distributing LLM rollouts
- Dataset environments (GSM8K, MMLU, custom HuggingFace datasets)
- Online game environments (Blackjack, Taxi, text-based games)
- RLAIF and RLHF support
- Multi-turn RL for complex multi-step interactions
- Code execution environments (MBPP, HumanEval)
- Multimodal environments (OCR VQA, CLEVR)
- OpenAI-compatible API endpoint support
- vLLM and SGLang native server integrations
- Axolotl trainer plugin integration
- Tinker LoRA trainer integration
- On-Policy Distillation (OPD) support
- TeacherDistillationEnv for teacher-student distillation
- atropos-sft-gen and atropos-dpo-gen CLI tools
- process subcommand for inference-only rollouts
- JSONL output and HTML visualization
- Weights & Biases logging
- Gradio UI via view-run
- Slurm support for distributed inference
- Pre-commit hooks and contribution guide
- MIT License
