slime
An open-source LLM post-training framework for RL scaling that connects Megatron training with SGLang rollout for high-performance reinforcement learning workflows.
At a Glance
Fully free and open-source under Apache License 2.0
Engagement
Available On
Alternatives
Listed Jul 2026
About slime
slime is an open-source LLM post-training framework for reinforcement learning (RL) scaling, developed by THUDM (Zhipu AI's research group) and released under the Apache 2.0 license. It connects Megatron-LM for training with SGLang for rollout, providing a unified path for training data generation, reward computation, and environment interaction. The project is actively maintained on GitHub and reached v0.3.0 as of May 2026.
What It Is
slime provides two core capabilities: high-performance training via Megatron-LM and flexible data generation through custom interfaces and server-based engines. Its design goal is to keep these two capabilities tightly integrated without building a heavy stack of disconnected trainers, rollout services, and agent frameworks. All components — Megatron training, SGLang rollout, custom data generation, reward computation, verifier feedback, and environment interaction — flow through the same training/rollout/Data Buffer path.
Architecture and Engine Pass-Through
The framework is built around a three-module architecture:
- Training (Megatron): Handles the main training process, reads data from the Data Buffer, and synchronizes parameters to the rollout module after each training step.
- Rollout (SGLang + router): Generates new data including rewards and verifier outputs, storing results in the Data Buffer. Custom generate functions can wrap this with multi-turn loops, tool calls, environment/sandbox interaction, and verifier-based reward.
- Data Buffer: A bridge module managing prompt initialization, custom data, and rollout generation methods including agentic workflows.
slime passes Megatron arguments through directly and exposes SGLang arguments with a --sglang- prefix, so upstream training and serving optimizations remain accessible without additional abstraction layers.
Production Validation and Model Support
According to the project documentation, slime is the RL framework behind the GLM model family including GLM-5.2, GLM-5.1, GLM-5, GLM-4.7, GLM-4.6, and GLM-4.5. Beyond the GLM family, slime supports:
- Qwen series: Qwen3.6, Qwen3.5, Qwen3Next, Qwen3MoE, Qwen3, Qwen2.5
- DeepSeek V3 series: DeepSeek V3, V3.1, DeepSeek R1
- Llama 3
The framework has been exercised through large-scale training runs including a 256×H100 configuration for GLM-5.2 (744B-A40B MoE) and 128×H100 for DeepSeek R1.
Advanced Features
slime includes several production-grade capabilities:
- BF16 training with FP8 rollout: Large MoE recipes use Megatron BF16 training state with SGLang FP8 rollout/inference
- PD Disaggregation: Separate prefill/decode resource allocation for multi-turn and agentic workloads
- Delta Weight Sync: Efficient weight updates for training/inference disaggregation
- Speculative Decoding: Supported for rollout acceleration
- On-Policy Distillation: Hindsight-based training signal extraction
- Fault Tolerance and Reproducibility: First-class engineering concerns with documented CI coverage
- AMD hardware support: Platform-specific tutorial available
Ecosystem and Adoption Signal
The project has attracted a growing ecosystem of frameworks built on slime as an RL substrate. Notable projects include Dressage (Alibaba Accio), Miles (RadixArk), vime (vLLM project), Relax (RedAI Infra), OpenClaw-RL, P1 (physics reasoning), RLVE, TritonForge, APRIL, qqr (Alibaba NLP), and ART (AWS Bedrock AgentCore Runtime). The GitHub repository reports 7,174 stars and 1,018 forks as of the data snapshot.
Update: v0.3.0
The latest release is v0.3.0, published May 31, 2026. The project was created in June 2025 and has seen rapid development, with the repository last pushed to on June 30, 2026. The v0.1.0 release blog post, titled "Redefining High-Performance RL Training Frameworks," and an introductory post on the LMSYS blog describe the framework's design philosophy and differentiation from alternatives like veRL and OpenRLHF.
Community Discussions
Be the first to start a conversation about slime
Share your experience with slime, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully free and open-source under Apache License 2.0
- Full source code access
- Megatron + SGLang integration
- Agentic RL workflows
- Community support via GitHub Issues
- All advanced features included
Capabilities
Key Features
- Megatron-LM training integration
- SGLang-native rollout backend
- Unified training/rollout/Data Buffer path
- Custom data generation interfaces
- Agentic RL workflows (multi-agent, coding agent, search/RAG)
- BF16 training with FP8 rollout
- PD disaggregation for prefill/decode separation
- Delta weight sync
- Speculative decoding
- On-policy distillation
- Fault tolerance and reproducibility
- Observability and trace viewer
- Profiling support
- Low precision training and rollout
- SGLang config YAML for topology control
- External rollout engine support
- Dense and MoE model support
- Fully-async rollout
- CPU unit tests and GPU end-to-end CI
- AMD hardware platform support
