slime

Name: slime
Availability: OnlineOnly
Author: THUDM

An open-source LLM post-training framework for RL scaling that connects Megatron training with SGLang rollout for high-performance reinforcement learning workflows.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under Apache License 2.0

Engagement

Available On

CLI

API

Linux

THUDMTHUDM (Tsinghua University Data Mining group) builds large-s…

Listed Jul 2026

About slime

slime is an open-source LLM post-training framework for reinforcement learning (RL) scaling, developed by THUDM (Zhipu AI's research group) and released under the Apache 2.0 license. It connects Megatron-LM for training with SGLang for rollout, providing a unified path for training data generation, reward computation, and environment interaction. The project is actively maintained on GitHub and reached v0.3.0 as of May 2026.

What It Is

slime provides two core capabilities: high-performance training via Megatron-LM and flexible data generation through custom interfaces and server-based engines. Its design goal is to keep these two capabilities tightly integrated without building a heavy stack of disconnected trainers, rollout services, and agent frameworks. All components — Megatron training, SGLang rollout, custom data generation, reward computation, verifier feedback, and environment interaction — flow through the same training/rollout/Data Buffer path.

Architecture and Engine Pass-Through

The framework is built around a three-module architecture:

Training (Megatron): Handles the main training process, reads data from the Data Buffer, and synchronizes parameters to the rollout module after each training step.
Rollout (SGLang + router): Generates new data including rewards and verifier outputs, storing results in the Data Buffer. Custom generate functions can wrap this with multi-turn loops, tool calls, environment/sandbox interaction, and verifier-based reward.
Data Buffer: A bridge module managing prompt initialization, custom data, and rollout generation methods including agentic workflows.

slime passes Megatron arguments through directly and exposes SGLang arguments with a --sglang- prefix, so upstream training and serving optimizations remain accessible without additional abstraction layers.

Production Validation and Model Support

According to the project documentation, slime is the RL framework behind the GLM model family including GLM-5.2, GLM-5.1, GLM-5, GLM-4.7, GLM-4.6, and GLM-4.5. Beyond the GLM family, slime supports:

Qwen series: Qwen3.6, Qwen3.5, Qwen3Next, Qwen3MoE, Qwen3, Qwen2.5
DeepSeek V3 series: DeepSeek V3, V3.1, DeepSeek R1
Llama 3

The framework has been exercised through large-scale training runs including a 256×H100 configuration for GLM-5.2 (744B-A40B MoE) and 128×H100 for DeepSeek R1.

Advanced Features

slime includes several production-grade capabilities:

BF16 training with FP8 rollout: Large MoE recipes use Megatron BF16 training state with SGLang FP8 rollout/inference
PD Disaggregation: Separate prefill/decode resource allocation for multi-turn and agentic workloads
Delta Weight Sync: Efficient weight updates for training/inference disaggregation
Speculative Decoding: Supported for rollout acceleration
On-Policy Distillation: Hindsight-based training signal extraction
Fault Tolerance and Reproducibility: First-class engineering concerns with documented CI coverage
AMD hardware support: Platform-specific tutorial available

Ecosystem and Adoption Signal

The project has attracted a growing ecosystem of frameworks built on slime as an RL substrate. Notable projects include Dressage (Alibaba Accio), Miles (RadixArk), vime (vLLM project), Relax (RedAI Infra), OpenClaw-RL, P1 (physics reasoning), RLVE, TritonForge, APRIL, qqr (Alibaba NLP), and ART (AWS Bedrock AgentCore Runtime). The GitHub repository reports 7,174 stars and 1,018 forks as of the data snapshot.

Update: v0.3.0

The latest release is v0.3.0, published May 31, 2026. The project was created in June 2025 and has seen rapid development, with the repository last pushed to on June 30, 2026. The v0.1.0 release blog post, titled "Redefining High-Performance RL Training Frameworks," and an introductory post on the LMSYS blog describe the framework's design philosophy and differentiation from alternatives like veRL and OpenRLHF.

Community Discussions

Be the first to start a conversation about slime

Share your experience with slime, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source under Apache License 2.0

Full source code access
Megatron + SGLang integration
Agentic RL workflows
Community support via GitHub Issues
All advanced features included

Capabilities

Key Features

Megatron-LM training integration
SGLang-native rollout backend
Unified training/rollout/Data Buffer path
Custom data generation interfaces
Agentic RL workflows (multi-agent, coding agent, search/RAG)
BF16 training with FP8 rollout
PD disaggregation for prefill/decode separation
Delta weight sync
Speculative decoding
On-policy distillation
Fault tolerance and reproducibility
Observability and trace viewer
Profiling support
Low precision training and rollout
SGLang config YAML for topology control
External rollout engine support
Dense and MoE model support
Fully-async rollout
CPU unit tests and GPU end-to-end CI
AMD hardware platform support

Integrations

Megatron-LM

SGLang

DeepSeek R1

Qwen3

GLM model family

Llama 3

Gemma4

vLLM (via vime)

Ray

AWS Bedrock AgentCore Runtime

E2B sandbox

Kubernetes

bwrap

API Available

View Docs

Back to all tools Suggest an edit