video-use

Name: video-use
Availability: OnlineOnly
Author: Browser Use

An open-source tool that lets you edit videos with AI coding agents like Claude Code by reading transcripts and generating final.mp4 outputs.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open source under the MIT License. Self-host on your own machine or VPS.

Engagement

Available On

macOS

API

CLI

Browser UseSan Francisco, CAEst. 2024$17000000 raised

Listed Jul 2026

About video-use

video-use is a 100% open-source Python project from Browser Use that lets you edit videos by chatting with AI coding agents such as Claude Code, Codex, or Hermes. Drop raw footage into a folder, describe what you want, and the agent produces a finished final.mp4 — handling cuts, color grading, subtitles, and animation overlays without any traditional video editing UI.

What It Is

video-use is an agent skill — a structured set of scripts and prompts that gives a shell-capable LLM everything it needs to perform professional video editing tasks. Rather than feeding the model raw video frames, it converts footage into a compact text representation (word-level transcripts, speaker diarization, audio events) and generates on-demand visual composites only when needed. The result is a pipeline that reasons about video edits at word-boundary precision while consuming a fraction of the tokens a naive frame-dumping approach would require.

How the Pipeline Works

The editing pipeline follows a strict sequence: Transcribe → Pack → LLM Reasons → EDL → Render → Self-Eval. Key stages include:

Layer 1 (Audio transcript): One ElevenLabs Scribe call per source file produces word-level timestamps, speaker diarization, and audio events. All takes are packed into a single ~12KB takes_packed.md file that serves as the LLM's primary reading surface.
Layer 2 (Visual composite, on demand): A timeline_view tool generates a filmstrip + waveform + word-label PNG for any time range, called only at decision points such as ambiguous pauses or cut-point sanity checks.
Self-eval loop: After rendering, the agent runs timeline_view on the output at every cut boundary to catch visual jumps, audio pops, or hidden subtitles — retrying up to three times before surfacing a preview.

The README notes the contrast: "Naive approach: 30,000 frames × 1,500 tokens = 45M tokens of noise. Video Use: 12KB text + a handful of PNGs."

What It Produces

video-use handles a wide range of editing tasks automatically:

Cuts filler words (umm, uh, false starts) and dead space between takes
Auto color grades every segment (warm cinematic, neutral punch, or custom ffmpeg chains)
Applies 30ms audio fades at every cut to prevent audio pops
Burns subtitles in a configurable style (2-word UPPERCASE chunks by default)
Generates animation overlays via HyperFrames, Remotion, Manim, or PIL — spawned as parallel sub-agents
Persists session memory in project.md so future sessions resume context

Setup and Deployment

Installation is designed to be agent-driven: paste a single setup prompt into Claude Code or another agent and it handles the clone, dependency installation, skill registration, and prompts once for an ElevenLabs API key. Manual installation is also documented and requires Python (via uv or pip), ffmpeg, and optionally yt-dlp for downloading online sources. The skill directory integrates with Claude Code's ~/.claude/skills/ path or equivalent for other agents. For always-on editing from a VPS or Telegram, the README points to Browser Use Box as a hosting option.

Design Principles

The project is built around five explicit principles: text-first representation with on-demand visuals; audio as the primary editing surface with visuals following; a confirm-before-execute workflow with self-evaluation; zero assumptions about content type; and 12 hard production-correctness rules with artistic freedom elsewhere. These principles distinguish it from GUI-based editors and from naive LLM video approaches that attempt to process raw frames.

Current Status

The repository was created in April 2026 and last pushed in July 2026. The project is MIT-licensed and hosted under the browser-use GitHub organization, which also maintains the Browser Use web automation framework. The README references Browser Use Cloud as a hosted environment where video-use can be tried directly.

Community Discussions

Be the first to start a conversation about video-use

Share your experience with video-use, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open source under the MIT License. Self-host on your own machine or VPS.

MIT License
Full source code access
All editing features included
Self-hosted deployment

Capabilities

Key Features

Cuts filler words and dead space between takes
Auto color grading (warm cinematic, neutral punch, or custom ffmpeg chain)
30ms audio fades at every cut
Burns subtitles in configurable style
Animation overlays via HyperFrames, Remotion, Manim, or PIL
Self-evaluating render loop (up to 3 retries)
Session memory persistence in project.md
Word-level transcript via ElevenLabs Scribe
Speaker diarization and audio event detection
On-demand visual composite (filmstrip + waveform + word labels)
Works with Claude Code, Codex, Hermes, and other shell-capable agents
yt-dlp support for downloading online sources

Integrations

Claude Code

OpenAI Codex

Hermes

ElevenLabs Scribe

ffmpeg

yt-dlp

HyperFrames

Remotion

Manim

PIL

Browser Use Cloud

Browser Use Box

API Available

View Docs

Back to all tools Suggest an edit