# VOID: Video Object and Interaction Deletion

> VOID removes objects from videos along with all physical interactions they induce on the scene, such as objects falling when a person is removed.

VOID (Video Object and Interaction Deletion) is an open-source video inpainting model developed by Netflix Research that removes objects from videos while also eliminating the physical interactions those objects cause — not just visual artifacts like shadows, but dynamic effects like objects falling or being displaced. Built on top of CogVideoX and fine-tuned for interaction-aware video inpainting, VOID uses a novel quadmask conditioning system to distinguish between primary objects, affected regions, and background. The model runs in two sequential passes: Pass 1 for base inpainting and Pass 2 for warped-noise temporal refinement, enabling high-quality results on longer video clips.

- **Interaction-aware removal** — *Uses a 4-value quadmask (primary object, overlap, affected region, background) to model and remove physical interactions like falling objects, not just the target object itself.*
- **Two-pass inference pipeline** — *Pass 1 performs base inpainting; Pass 2 applies optical flow-warped latent initialization for improved temporal consistency on longer clips.*
- **VLM-powered mask generation** — *The VLM-MASK-REASONER pipeline uses SAM2 segmentation and Gemini for automated reasoning about interaction-affected regions, generating quadmasks from raw video.*
- **Manual mask refinement GUI** — *An included GUI editor allows frame-by-frame refinement of quadmasks with brush tools, grid toggles, and undo/redo support.*
- **Synthetic training data pipelines** — *Provides two data generation pipelines: HUMOTO (human-object interaction via Blender/motion capture) and Kubric (physics-based object interaction), both producing paired counterfactual videos.*
- **Google Colab notebook** — *A ready-to-run notebook handles setup, model download, and inference on sample videos; requires a GPU with 40GB+ VRAM (e.g., A100).*
- **HuggingFace model hosting** — *Both VOID Pass 1 and Pass 2 checkpoints are available on HuggingFace for direct download and use.*
- **Apache 2.0 licensed** — *Fully open-source; free to use, modify, and distribute under the Apache License 2.0.*

## Features
- Interaction-aware video object removal
- Two-pass inference pipeline (base + warped-noise refinement)
- Quadmask conditioning (4-value semantic mask)
- VLM-powered automated mask generation (SAM2 + Gemini)
- Manual quadmask refinement GUI
- HUMOTO-based synthetic training data generation (Blender)
- Kubric-based synthetic training data generation
- Google Colab notebook for quick start
- HuggingFace model hosting (Pass 1 and Pass 2 checkpoints)
- Batch inference support
- Optical flow-based temporal consistency (Pass 2)
- DeepSpeed ZeRO stage 2 training support

## Integrations
CogVideoX, SAM2, Gemini (Google AI API), HuggingFace, Google Colab, Blender, Kubric, DeepSpeed, HUMOTO, ffmpeg, VideoX-Fun

## Platforms
CLI, API, DEVELOPER_SDK

## Pricing
Open Source

## Links
- Website: https://github.com/Netflix/void-model
- Documentation: https://github.com/Netflix/void-model/blob/main/README.md
- Repository: https://github.com/Netflix/void-model
- EveryDev.ai: https://www.everydev.ai/tools/void-model