# DeepSpeed

> An open-source deep learning optimization library by Microsoft that enables efficient training and inference of large-scale AI models through ZeRO, 3D-Parallelism, and other system innovations.

DeepSpeed is an open-source deep learning optimization library developed by Microsoft that dramatically reduces the computational cost and memory requirements of training and deploying large-scale AI models. It introduces groundbreaking system innovations such as ZeRO (Zero Redundancy Optimizer), 3D-Parallelism, DeepSpeed-MoE, and ZeRO-Infinity that have enabled training of models with hundreds of billions of parameters. DeepSpeed has powered some of the world's largest language models, including Megatron-Turing NLG (530B) and BLOOM (176B), and integrates seamlessly with popular frameworks like Hugging Face Transformers, PyTorch Lightning, and Accelerate.

- **ZeRO Optimizer** — *eliminates memory redundancy across data-parallel processes, enabling training of trillion-parameter models on commodity hardware.*
- **3D-Parallelism** — *combines data, pipeline, and tensor parallelism to scale training across thousands of GPUs efficiently.*
- **ZeRO-Offload & ZeRO-Infinity** — *offloads optimizer states, gradients, and parameters to CPU/NVMe storage, breaking the GPU memory wall for extreme-scale training.*
- **DeepSpeed Inference** — *provides highly optimized inference kernels and model parallelism for fast, cost-effective deployment of large transformer models.*
- **DeepSpeed-MoE** — *advances Mixture-of-Experts training and inference to power next-generation AI at scale.*
- **Model Compression** — *includes quantization (ZeroQuant), pruning, and knowledge distillation tools to reduce model size and accelerate inference.*
- **Autotuning** — *automatically finds the optimal DeepSpeed configuration for a given model and hardware setup.*
- **DeepSpeed-Chat** — *provides easy, fast, and affordable RLHF training for ChatGPT-like models at all scales.*
- **Data Efficiency** — *improves model quality and training efficiency via efficient data sampling and routing techniques.*
- **Sparse Attention** — *implements custom sparse attention kernels to handle long sequences efficiently.*

To get started, install DeepSpeed via pip (`pip install deepspeed`), then wrap your PyTorch training loop using the `deepspeed.initialize()` API and provide a JSON configuration file specifying ZeRO stage, optimizer, and precision settings.

## Features
- ZeRO Optimizer (Stages 1, 2, 3)
- ZeRO-Offload and ZeRO-Infinity
- 3D-Parallelism (data, pipeline, tensor)
- DeepSpeed-MoE (Mixture-of-Experts)
- Mixed Precision Training (FP16, BF16)
- Model Compression and Quantization (ZeroQuant)
- DeepSpeed Inference with optimized kernels
- Autotuning for optimal configuration
- DeepSpeed-Chat for RLHF training
- Sparse Attention kernels
- Pipeline Parallelism
- Curriculum Learning and Data Efficiency
- Flops Profiler
- Communication Logging
- Universal Checkpointing
- Arctic Long Sequence Training (ALST)
- DeepNVMe for NVMe offloading
- Automatic Tensor Parallelism

## Integrations
Hugging Face Transformers, Hugging Face Accelerate, PyTorch Lightning, MosaicML Composer, PyTorch, Azure ML, Megatron-LM, NVIDIA GPUs, AMD GPUs, Intel Gaudi

## Platforms
CLI, API, DEVELOPER_SDK

## Pricing
Open Source

## Links
- Website: https://www.deepspeed.ai
- Documentation: https://deepspeed.readthedocs.io/en/latest/
- Repository: https://github.com/deepspeedai/DeepSpeed
- EveryDev.ai: https://www.everydev.ai/tools/deepspeed