DeepSpeed

Name: DeepSpeed
Availability: OnlineOnly
Author: DeepSpeed (Microsoft)

An open-source deep learning optimization library by Microsoft that enables efficient training and inference of large-scale AI models through ZeRO, 3D-Parallelism, and other system innovations.

Visit Website

At a Glance

Pricing

Open Source

Fully open-source under Apache 2.0 license. All features available at no cost.

Engagement

Available On

CLI

API

SDK

DeepSpeed (Microsoft)San Francisco, CAEst. 2020

Listed May 2026

About DeepSpeed

DeepSpeed is an open-source deep learning optimization library developed by Microsoft that dramatically reduces the computational cost and memory requirements of training and deploying large-scale AI models. It introduces groundbreaking system innovations such as ZeRO (Zero Redundancy Optimizer), 3D-Parallelism, DeepSpeed-MoE, and ZeRO-Infinity that have enabled training of models with hundreds of billions of parameters. DeepSpeed has powered some of the world's largest language models, including Megatron-Turing NLG (530B) and BLOOM (176B), and integrates seamlessly with popular frameworks like Hugging Face Transformers, PyTorch Lightning, and Accelerate.

ZeRO Optimizer — eliminates memory redundancy across data-parallel processes, enabling training of trillion-parameter models on commodity hardware.
3D-Parallelism — combines data, pipeline, and tensor parallelism to scale training across thousands of GPUs efficiently.
ZeRO-Offload & ZeRO-Infinity — offloads optimizer states, gradients, and parameters to CPU/NVMe storage, breaking the GPU memory wall for extreme-scale training.
DeepSpeed Inference — provides highly optimized inference kernels and model parallelism for fast, cost-effective deployment of large transformer models.
DeepSpeed-MoE — advances Mixture-of-Experts training and inference to power next-generation AI at scale.
Model Compression — includes quantization (ZeroQuant), pruning, and knowledge distillation tools to reduce model size and accelerate inference.
Autotuning — automatically finds the optimal DeepSpeed configuration for a given model and hardware setup.
DeepSpeed-Chat — provides easy, fast, and affordable RLHF training for ChatGPT-like models at all scales.
Data Efficiency — improves model quality and training efficiency via efficient data sampling and routing techniques.
Sparse Attention — implements custom sparse attention kernels to handle long sequences efficiently.

To get started, install DeepSpeed via pip (pip install deepspeed), then wrap your PyTorch training loop using the deepspeed.initialize() API and provide a JSON configuration file specifying ZeRO stage, optimizer, and precision settings.

Community Discussions

Be the first to start a conversation about DeepSpeed

Share your experience with DeepSpeed, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source (Free)

Fully open-source under Apache 2.0 license. All features available at no cost.

ZeRO Optimizer (Stages 1/2/3)
ZeRO-Offload and ZeRO-Infinity
3D-Parallelism
DeepSpeed-MoE
Mixed Precision Training

Capabilities

Key Features

ZeRO Optimizer (Stages 1, 2, 3)
ZeRO-Offload and ZeRO-Infinity
3D-Parallelism (data, pipeline, tensor)
DeepSpeed-MoE (Mixture-of-Experts)
Mixed Precision Training (FP16, BF16)
Model Compression and Quantization (ZeroQuant)
DeepSpeed Inference with optimized kernels
Autotuning for optimal configuration
DeepSpeed-Chat for RLHF training
Sparse Attention kernels
Pipeline Parallelism
Curriculum Learning and Data Efficiency
Flops Profiler
Communication Logging
Universal Checkpointing
Arctic Long Sequence Training (ALST)
DeepNVMe for NVMe offloading
Automatic Tensor Parallelism

Integrations

Hugging Face Transformers

Hugging Face Accelerate

PyTorch Lightning

MosaicML Composer

PyTorch

Azure ML

Megatron-LM

NVIDIA GPUs

AMD GPUs

Intel Gaudi

API Available

View Docs

Demo Video

Watch on YouTube

Back to all tools

DeepSpeed

AI Infrastructure

An open-source deep learning optimization library by Microsoft that enables efficient training and inference of large-scale AI models through ZeRO, 3D-Parallelism, and other system innovations.

Visit Website

At a Glance

Pricing

Open Source

Fully open-source under Apache 2.0 license. All features available at no cost.

Engagement

2views

Discussions

Available On

CLI

API

SDK

Resources

Website Docs GitHub llms.txt

Topics

AI Infrastructure AI Development Libraries Local Inference

Alternatives

PaddlePaddle tinygrad thrml

Developer

DeepSpeed (Microsoft)San Francisco, CAEst. 2020

Listed May 2026

About DeepSpeed

ZeRO Optimizer — eliminates memory redundancy across data-parallel processes, enabling training of trillion-parameter models on commodity hardware.
3D-Parallelism — combines data, pipeline, and tensor parallelism to scale training across thousands of GPUs efficiently.
ZeRO-Offload & ZeRO-Infinity — offloads optimizer states, gradients, and parameters to CPU/NVMe storage, breaking the GPU memory wall for extreme-scale training.
DeepSpeed Inference — provides highly optimized inference kernels and model parallelism for fast, cost-effective deployment of large transformer models.
DeepSpeed-MoE — advances Mixture-of-Experts training and inference to power next-generation AI at scale.
Model Compression — includes quantization (ZeroQuant), pruning, and knowledge distillation tools to reduce model size and accelerate inference.
Autotuning — automatically finds the optimal DeepSpeed configuration for a given model and hardware setup.
DeepSpeed-Chat — provides easy, fast, and affordable RLHF training for ChatGPT-like models at all scales.
Data Efficiency — improves model quality and training efficiency via efficient data sampling and routing techniques.
Sparse Attention — implements custom sparse attention kernels to handle long sequences efficiently.

Community Discussions

Be the first to start a conversation about DeepSpeed

Share your experience with DeepSpeed, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source (Free)

Fully open-source under Apache 2.0 license. All features available at no cost.

ZeRO Optimizer (Stages 1/2/3)
ZeRO-Offload and ZeRO-Infinity
3D-Parallelism
DeepSpeed-MoE
Mixed Precision Training

Capabilities

Key Features

ZeRO Optimizer (Stages 1, 2, 3)
ZeRO-Offload and ZeRO-Infinity
3D-Parallelism (data, pipeline, tensor)
DeepSpeed-MoE (Mixture-of-Experts)
Mixed Precision Training (FP16, BF16)
Model Compression and Quantization (ZeroQuant)
DeepSpeed Inference with optimized kernels
Autotuning for optimal configuration
DeepSpeed-Chat for RLHF training
Sparse Attention kernels
Pipeline Parallelism
Curriculum Learning and Data Efficiency
Flops Profiler
Communication Logging
Universal Checkpointing
Arctic Long Sequence Training (ALST)
DeepNVMe for NVMe offloading
Automatic Tensor Parallelism

Integrations

Hugging Face Transformers

Hugging Face Accelerate

PyTorch Lightning

MosaicML Composer

PyTorch

Azure ML

Megatron-LM

NVIDIA GPUs

AMD GPUs

Intel Gaudi

API Available

View Docs

Demo Video

Watch on YouTube

Back to all tools