Sana
SANA is an open-source, efficiency-oriented framework by NVIDIA Labs for high-resolution image and video generation using Linear Diffusion Transformers, deployable on consumer GPUs with as little as 8GB VRAM.
At a Glance
Fully open-source under Apache 2.0 license. Free to use, modify, and distribute.
Engagement
Available On
Alternatives
Listed Jun 2026
About Sana
SANA is a fully open-source codebase developed by NVIDIA Labs (NVlabs) for high-resolution image and video generation. Released under the Apache 2.0 license, it provides complete training and inference pipelines and is designed to run on consumer-grade hardware via 4-bit quantization. The repository has accumulated over 8,200 GitHub stars since its initial release in October 2024.
What It Is
SANA is a series of efficient diffusion models built around a Linear Diffusion Transformer (DiT) architecture. The core innovation replaces standard attention in DiT with linear attention, enabling high-resolution generation at dramatically reduced compute cost. The codebase covers multiple model variants — SANA (image), SANA-1.5 (scaled training/inference), SANA-Sprint (one/few-step generation), SANA-Video (video generation), SANA-WM (world modeling), and Sol-RL (reinforcement learning post-training) — all sharing a unified framework.
Key Architectural Techniques
- Linear Attention: Replaces vanilla attention in DiT for efficiency at high resolutions.
- DC-AE (Deep Compression AutoEncoder): Achieves 32× image compression versus the traditional 8×, reducing latent token count significantly.
- Decoder-only Text Encoder: Uses a modern decoder-only LLM with in-context learning for improved text-image alignment.
- Block Causal Linear Attention & Causal Mix-FFN: Efficient attention and feedforward mechanisms designed for long video generation.
- sCM Distillation: Enables one/few-step generation via continuous-time consistency distillation (used in SANA-Sprint).
- Sol-RL: Combines NVFP4 (low-precision) rollout with BF16 (high-precision) optimization for faster RL training convergence.
Model Variants and Performance
The README benchmarks SANA against FLUX-dev at 1024×1024 resolution. According to the repository's own performance table, Sana-0.6B achieves 39.5× speedup over FLUX-dev while Sana-1.6B achieves 23.3×. SANA-Sprint generates a 1024px image in 0.1 seconds on H100 and 0.3 seconds on RTX 4090. For video, SANA-Video-2B achieves a 36-second latency at 720p versus 400 seconds for Wan-2.1-1.3B, per the repository's VBench comparison table. SANA-WM is a 2.6B parameter controllable world model supporting 720p, 1-minute video generation with 6-DoF camera control.
Deployment and Integration Ecosystem
SANA is designed for flexible deployment across a wide range of environments:
- HuggingFace Diffusers: Full pipeline support via
SanaPipeline,SanaPAGPipeline, and compatible schedulers (requiresdiffusers>=0.32.0). - ComfyUI: Official node support for SANA, SANA-1.5, and SANA-Sprint workflows.
- SGLang: High-performance serving with an OpenAI-compatible API.
- Replicate API: Available on H100 hardware via Replicate.
- Cosmos-RL: Post-training (SFT/RL) integration for SANA-Image and SANA-Video.
- Quantization: 4-bit (via SVDQuant/Nunchaku) and 8-bit quantization allow inference within 8GB GPU VRAM, including on laptop GPUs.
- LoRA / DreamBooth: Fine-tuning support via diffusers.
- ControlNet: Training, inference, and model weights for controllable generation.
Update: v2.0.0 — SANA-Video and SANA-WM
The latest release (v2.0.0, published June 9, 2026) bundles SANA-Video and SANA-WM as the headline additions. Recent milestones include: SANA-Video 720p with LTX-VAE (March 2026), Sol-RL with NVFP4 rollout recipes for SANA, FLUX.1, and SD3.5-L (April 2026), and SANA-WM with 6-DoF camera control (May 2026). The project has received multiple academic recognitions — SANA was accepted as an ICLR 2025 Oral, SANA-Sprint as an ICCV 2025 Highlight, SANA-1.5 at ICML 2025, and SANA-Video as an ICLR 2026 Oral — indicating active research-to-release velocity.
Community Discussions
Be the first to start a conversation about Sana
Share your experience with Sana, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully open-source under Apache 2.0 license. Free to use, modify, and distribute.
- Full training and inference pipelines
- All model weights on HuggingFace
- ComfyUI, Diffusers, SGLang integrations
- 4-bit and 8-bit quantization
- ControlNet, LoRA, DreamBooth support
Capabilities
Key Features
- Text-to-image generation up to 4K resolution
- Text-to-video generation up to 720p
- One/few-step image generation via sCM distillation (SANA-Sprint)
- Linear Diffusion Transformer architecture
- DC-AE 32x image compression
- 4-bit and 8-bit quantization for consumer GPU inference
- ControlNet support for controllable generation
- LoRA and DreamBooth fine-tuning
- Inference-time and training-time compute scaling (SANA-1.5)
- Reinforcement learning post-training via Sol-RL
- World modeling with 6-DoF camera control (SANA-WM)
- Real-time minute-length video generation (LongSANA)
- Multi-linguistic support (English, Chinese, Emoji)
- FSDP and DDP distributed training
- ComfyUI node integration
- SGLang OpenAI-compatible API serving
- HuggingFace Diffusers pipeline support
Integrations
Demo Video

