Miles

Name: Miles
Availability: OnlineOnly
Author: radixark

Enterprise-grade reinforcement learning framework for large-scale LLM and VLM post-training, featuring high-performance rollout, low-precision training, and production stability.

Visit Website

At a Glance

Pricing

Open Source

Fully open-source under Apache License 2.0. Free to use, modify, and distribute.

Engagement

Available On

CLI

API

SDK

radixarkRedwood City, CAEst. 2025$100M raised

Listed May 2026

About Miles

Miles is an open-source reinforcement learning framework built for enterprise-scale post-training of large language models (LLMs) and vision-language models (VLMs). It is a fork of the slime project, developed jointly by InfiXAI, Ant Group, the SGLang RL Team, and the Miles community. The project launched in November 2025 and is actively maintained under the Apache License 2.0.

What It Is

Miles sits at the intersection of research-grade RL and production-grade reliability. It integrates SGLang for high-throughput rollout and Megatron-LM for scalable distributed training, targeting the system-level challenges that cause instability and inefficiency when applying reinforcement learning to models at the 1TB+ parameter scale. The framework is designed to be a unified entry point for complex RL workloads including multi-turn interaction, vision-language training, reasoning, coding agents, and multi-agent co-evolution.

Core Technical Architecture

Miles addresses several fundamental problems in large-scale RL training through system-level innovations:

Unified FP8 Pipeline: End-to-end FP8 sampling and training that eliminates quantization-induced discrepancy between rollout and training, preventing RL collapse in large MoE models.
Rollout Routing Replay (R3): Records expert routing decisions during SGLang inference and replays them during Megatron training to ensure bit-wise expert alignment in MoE architectures like Qwen3 and DeepSeek-V3.
INT4 QAT Support: Full-stack INT4 W4A16 Quantization-Aware Training pipeline, inspired by the Kimi K2-Thinking report, enabling 1TB-scale models to fit into single-machine VRAM (e.g., NVIDIA H200) and doubling rollout efficiency.
Zero-Copy Weight Sync: Optimized weight refit via CUDA IPC zero-copy mapping, async tensor gathering, and bucketed flattening, reducing sync time by 50% compared to standard HTTP/RPC transfers (per project documentation).
Speculative RL Training: Uses an Online SFT Draft Model that updates during RL to prevent policy drift, achieving 25%+ rollout speedup according to the project's own benchmarks.

Model Support and Training Scenarios

Miles supports a broad range of state-of-the-art model families:

DeepSeek: R1, V3, V3.2
Qwen: 2, 2.5, 3
Llama: 3, 3.1, 3.3, 4
Gemma: 2, 3, 3N
GLM: 4.5, 4.6, 4.7
MiniMax: M2, M2.1
Others: Mistral, Mixtral, Phi, gpt-oss, and any model supported by SGLang and Megatron

Training scenarios span multi-turn interaction, unified VLM/LLM workflows, reasoning and coding tasks, and multi-agent co-evolutionary frameworks such as MrlX.

Setup Path

Miles recommends using its official Docker image for best performance and compatibility. It can also be installed from source via pip. Training is launched through a unified train.py entry point with command-line arguments for configuring cluster resources, training backends (Megatron/FSDP), SGLang inference optimization, and RL algorithmic hyperparameters. A detailed argument guide and Quick Start documentation are available in the repository's docs/ directory.

Update: Active Development Through Early 2026

The project has seen rapid iteration since its November 2025 launch. Notable recent additions include:

[2026/02] Detailed command-line argument guide for Miles server configuration
[2026/01] INT4 QAT pipeline for single-machine 1TB model training
[2026/01] Unified VLM/LLM multi-turn training support
[2026/01] MrlX multi-agent co-evolutionary framework integration
[2025/12] Rollout Routing Replay (R3) for MoE RL stability

The roadmap lists planned support for Diffusion RL, Omni RL, Diffusion LLM RL, and elastic resource scheduling. The repository had 1,378 stars and 220 forks as of late May 2026, per GitHub metadata.

Community Discussions

Be the first to start a conversation about Miles

Share your experience with Miles, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully open-source under Apache License 2.0. Free to use, modify, and distribute.

Full Miles framework source code
Unified FP8 training and rollout
INT4 QAT pipeline
SGLang and Megatron-LM integration
Multi-turn LLM and VLM training

Capabilities

Key Features

Unified FP8 end-to-end training and rollout pipeline
INT4 Quantization-Aware Training (QAT) for 1TB+ models
Rollout Routing Replay (R3) for MoE RL stability
Zero-copy weight synchronization via CUDA IPC
Speculative RL training with Online SFT Draft Model
Multi-turn LLM and VLM training support
Multi-agent co-evolutionary RL (MrlX)
Truncated and Masked Importance Sampling (TIS/MIS)
Partial rollout and over-sampling for long-tail RL
Support for DeepSeek, Qwen, Llama, Gemma, GLM, MiniMax, Mistral, Phi
SGLang integration for high-throughput rollout
Megatron-LM integration for scalable distributed training
FSDP training backend support
Docker image for production deployment
Detailed command-line argument configuration

Integrations

SGLang

Megatron-LM

FSDP

FlashAttention-3

DeepGEMM

NVIDIA Transformer Engine

Docker

CUDA IPC

MrlX

slime

API Available

View Docs

Back to all tools Suggest an edit