# VOID: Video Object and Interaction Deletion > VOID removes objects from videos along with all physical interactions they induce on the scene, such as objects falling when a person is removed. VOID (Video Object and Interaction Deletion) is an open-source video inpainting model developed by Netflix Research that removes objects from videos while also eliminating the physical interactions those objects cause — not just visual artifacts like shadows, but dynamic effects like objects falling or being displaced. Built on top of CogVideoX and fine-tuned for interaction-aware video inpainting, VOID uses a novel quadmask conditioning system to distinguish between primary objects, affected regions, and background. The model runs in two sequential passes: Pass 1 for base inpainting and Pass 2 for warped-noise temporal refinement, enabling high-quality results on longer video clips. - **Interaction-aware removal** — *Uses a 4-value quadmask (primary object, overlap, affected region, background) to model and remove physical interactions like falling objects, not just the target object itself.* - **Two-pass inference pipeline** — *Pass 1 performs base inpainting; Pass 2 applies optical flow-warped latent initialization for improved temporal consistency on longer clips.* - **VLM-powered mask generation** — *The VLM-MASK-REASONER pipeline uses SAM2 segmentation and Gemini for automated reasoning about interaction-affected regions, generating quadmasks from raw video.* - **Manual mask refinement GUI** — *An included GUI editor allows frame-by-frame refinement of quadmasks with brush tools, grid toggles, and undo/redo support.* - **Synthetic training data pipelines** — *Provides two data generation pipelines: HUMOTO (human-object interaction via Blender/motion capture) and Kubric (physics-based object interaction), both producing paired counterfactual videos.* - **Google Colab notebook** — *A ready-to-run notebook handles setup, model download, and inference on sample videos; requires a GPU with 40GB+ VRAM (e.g., A100).* - **HuggingFace model hosting** — *Both VOID Pass 1 and Pass 2 checkpoints are available on HuggingFace for direct download and use.* - **Apache 2.0 licensed** — *Fully open-source; free to use, modify, and distribute under the Apache License 2.0.* ## Features - Interaction-aware video object removal - Two-pass inference pipeline (base + warped-noise refinement) - Quadmask conditioning (4-value semantic mask) - VLM-powered automated mask generation (SAM2 + Gemini) - Manual quadmask refinement GUI - HUMOTO-based synthetic training data generation (Blender) - Kubric-based synthetic training data generation - Google Colab notebook for quick start - HuggingFace model hosting (Pass 1 and Pass 2 checkpoints) - Batch inference support - Optical flow-based temporal consistency (Pass 2) - DeepSpeed ZeRO stage 2 training support ## Integrations CogVideoX, SAM2, Gemini (Google AI API), HuggingFace, Google Colab, Blender, Kubric, DeepSpeed, HUMOTO, ffmpeg, VideoX-Fun ## Platforms CLI, API, DEVELOPER_SDK ## Pricing Open Source ## Links - Website: https://github.com/Netflix/void-model - Documentation: https://github.com/Netflix/void-model/blob/main/README.md - Repository: https://github.com/Netflix/void-model - EveryDev.ai: https://www.everydev.ai/tools/void-model