# STARFlow

> STARFlow is Apple's open-source transformer autoregressive flow model for high-quality text-to-image and text-to-video generation, combining autoregressive models with normalizing flows.

STARFlow is Apple's official open-source release of a novel transformer autoregressive flow architecture for high-quality image and video generation. The project, hosted on GitHub under the `apple` organization, covers both STARFlow (text-to-image) and STARFlow-V (text-to-video), with pretrained model checkpoints available on Hugging Face.

## What It Is

STARFlow is a generative AI research framework that combines the expressiveness of autoregressive models with the efficiency of normalizing flows. Rather than relying on diffusion-based approaches, it introduces a "deep-shallow" transformer block architecture that processes latent representations through normalizing flow layers. The result is a family of models capable of generating high-resolution images and temporally consistent videos from text prompts.

## Architecture and Model Family

The project ships two primary model variants:

- **STARFlow (3B parameters)**: Text-to-image generation at 256×256 resolution. Uses a 6-block deep-shallow architecture, T5-XL text encoder, SD-VAE, and RoPE positional encoding.
- **STARFlow-V (7B parameters)**: Text-to-video generation at up to 640×480 (480p). Supports up to 481 frames (~30 seconds at 16 FPS) with causal temporal attention and WAN2.2-VAE.
- **STARFlow2** and **NTM (Normalizing Trajectory Models)**: Two follow-on research directions with papers published but code listed as "TBD."

A key inference optimization is block-wise Jacobi iteration, which accelerates sampling by enabling parallel convergence across token blocks rather than strictly sequential decoding.

## Research Lineage and Recognition

The STARFlow paper (arXiv:2506.06276) was accepted as a **NeurIPS 2025 Spotlight**, and STARFlow-V (arXiv:2511.20462) received a **CVPR 2026 Highlight** designation, according to the repository's own badges and citations. The project cites four arXiv papers in total, reflecting an active research program at Apple spanning image synthesis, video generation, and unified multimodal generation.

## Setup and Usage Path

The repository targets ML researchers and practitioners comfortable with Python and distributed training. Setup involves:

1. Cloning the repo and creating a conda environment via `scripts/setup_conda.sh` or `pip install -r requirements.txt`
2. Downloading pretrained checkpoints from Hugging Face into a local `ckpts/` directory
3. Running inference via `torchrun` with provided shell scripts for both image and video generation

Training is supported via FSDP (Fully Sharded Data Parallel) for large-scale distributed runs, with gradient checkpointing available to reduce memory usage. The repository includes separate training scripts for image and video tasks, along with dry-run validation flags.

## Update: Active Development as of May 2026

The repository was created in October 2025 and last pushed to in May 2026, with 563 stars and 39 forks as of the latest metadata. The codebase covers STARFlow and STARFlow-V with full training and inference support, while STARFlow2 and NTM remain paper-only releases with code marked as forthcoming. The project is licensed under a custom Apple license (separate `LICENSE` and `LICENSE_MODEL` files), not a standard OSI-approved license.

## Features
- Text-to-image generation (256×256)
- Text-to-video generation (up to 480p, ~30 seconds)
- Text-image-to-video (TI2V) generation
- Transformer autoregressive flow architecture
- Block-wise Jacobi iteration for fast sampling
- FSDP support for distributed training
- Variable-length video generation
- Classifier-free guidance
- RoPE positional encoding
- Causal temporal attention for video
- Gradient checkpointing for memory efficiency
- Configurable aspect ratios and resolutions

## Integrations
Hugging Face (model checkpoints), T5-XL (text encoder), SD-VAE, WAN2.2-VAE, PyTorch, torchrun, conda, wandb (training logging)

## Platforms
CLI, API

## Pricing
Open Source

## Version
STARFlow-V v3 (starflow-v_7B_t2v_caus_480p_v3)

## Links
- Website: https://github.com/apple/ml-starflow
- Documentation: https://github.com/apple/ml-starflow
- Repository: https://github.com/apple/ml-starflow
- EveryDev.ai: https://www.everydev.ai/tools/starflow
