Sana

Name: Sana
Availability: OnlineOnly
Author: NVlabs (NVIDIA Research)

SANA is an open-source, efficiency-oriented framework by NVIDIA Labs for high-resolution image and video generation using Linear Diffusion Transformers, deployable on consumer GPUs with as little as 8GB VRAM.

Visit Website

At a Glance

Pricing

Open Source

Fully open-source under Apache 2.0 license. Free to use, modify, and distribute.

Engagement

Available On

CLI

API

SDK

NVlabs (NVIDIA Research)NVlabs is NVIDIA's research division, publishing open-source…

Listed Jun 2026

About Sana

SANA is a fully open-source codebase developed by NVIDIA Labs (NVlabs) for high-resolution image and video generation. Released under the Apache 2.0 license, it provides complete training and inference pipelines and is designed to run on consumer-grade hardware via 4-bit quantization. The repository has accumulated over 8,200 GitHub stars since its initial release in October 2024.

What It Is

SANA is a series of efficient diffusion models built around a Linear Diffusion Transformer (DiT) architecture. The core innovation replaces standard attention in DiT with linear attention, enabling high-resolution generation at dramatically reduced compute cost. The codebase covers multiple model variants — SANA (image), SANA-1.5 (scaled training/inference), SANA-Sprint (one/few-step generation), SANA-Video (video generation), SANA-WM (world modeling), and Sol-RL (reinforcement learning post-training) — all sharing a unified framework.

Key Architectural Techniques

Linear Attention: Replaces vanilla attention in DiT for efficiency at high resolutions.
DC-AE (Deep Compression AutoEncoder): Achieves 32× image compression versus the traditional 8×, reducing latent token count significantly.
Decoder-only Text Encoder: Uses a modern decoder-only LLM with in-context learning for improved text-image alignment.
Block Causal Linear Attention & Causal Mix-FFN: Efficient attention and feedforward mechanisms designed for long video generation.
sCM Distillation: Enables one/few-step generation via continuous-time consistency distillation (used in SANA-Sprint).
Sol-RL: Combines NVFP4 (low-precision) rollout with BF16 (high-precision) optimization for faster RL training convergence.

Model Variants and Performance

The README benchmarks SANA against FLUX-dev at 1024×1024 resolution. According to the repository's own performance table, Sana-0.6B achieves 39.5× speedup over FLUX-dev while Sana-1.6B achieves 23.3×. SANA-Sprint generates a 1024px image in 0.1 seconds on H100 and 0.3 seconds on RTX 4090. For video, SANA-Video-2B achieves a 36-second latency at 720p versus 400 seconds for Wan-2.1-1.3B, per the repository's VBench comparison table. SANA-WM is a 2.6B parameter controllable world model supporting 720p, 1-minute video generation with 6-DoF camera control.

Deployment and Integration Ecosystem

SANA is designed for flexible deployment across a wide range of environments:

HuggingFace Diffusers: Full pipeline support via SanaPipeline, SanaPAGPipeline, and compatible schedulers (requires diffusers>=0.32.0).
ComfyUI: Official node support for SANA, SANA-1.5, and SANA-Sprint workflows.
SGLang: High-performance serving with an OpenAI-compatible API.
Replicate API: Available on H100 hardware via Replicate.
Cosmos-RL: Post-training (SFT/RL) integration for SANA-Image and SANA-Video.
Quantization: 4-bit (via SVDQuant/Nunchaku) and 8-bit quantization allow inference within 8GB GPU VRAM, including on laptop GPUs.
LoRA / DreamBooth: Fine-tuning support via diffusers.
ControlNet: Training, inference, and model weights for controllable generation.

Update: v2.0.0 — SANA-Video and SANA-WM

The latest release (v2.0.0, published June 9, 2026) bundles SANA-Video and SANA-WM as the headline additions. Recent milestones include: SANA-Video 720p with LTX-VAE (March 2026), Sol-RL with NVFP4 rollout recipes for SANA, FLUX.1, and SD3.5-L (April 2026), and SANA-WM with 6-DoF camera control (May 2026). The project has received multiple academic recognitions — SANA was accepted as an ICLR 2025 Oral, SANA-Sprint as an ICCV 2025 Highlight, SANA-1.5 at ICML 2025, and SANA-Video as an ICLR 2026 Oral — indicating active research-to-release velocity.

Community Discussions

Be the first to start a conversation about Sana

Share your experience with Sana, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully open-source under Apache 2.0 license. Free to use, modify, and distribute.

Full training and inference pipelines
All model weights on HuggingFace
ComfyUI, Diffusers, SGLang integrations
4-bit and 8-bit quantization
ControlNet, LoRA, DreamBooth support

Capabilities

Key Features

Text-to-image generation up to 4K resolution
Text-to-video generation up to 720p
One/few-step image generation via sCM distillation (SANA-Sprint)
Linear Diffusion Transformer architecture
DC-AE 32x image compression
4-bit and 8-bit quantization for consumer GPU inference
ControlNet support for controllable generation
LoRA and DreamBooth fine-tuning
Inference-time and training-time compute scaling (SANA-1.5)
Reinforcement learning post-training via Sol-RL
World modeling with 6-DoF camera control (SANA-WM)
Real-time minute-length video generation (LongSANA)
Multi-linguistic support (English, Chinese, Emoji)
FSDP and DDP distributed training
ComfyUI node integration
SGLang OpenAI-compatible API serving
HuggingFace Diffusers pipeline support

Integrations

HuggingFace Diffusers

ComfyUI

SGLang

Replicate

Cosmos-RL

SVDQuant / Nunchaku

SUPIR (4K super-resolution)

LongLive

LTX-VAE

PixArt-alpha

PixArt-sigma

EfficientViT

API Available

View Docs

Demo Video

Watch on YouTube

Back to all tools Suggest an edit