Agentic Video Editor

Describe a video, get an edited one back — script, voice, cuts and all.

Builder·2025 — present

// what problem this solves

Creating a short video — even a simple explainer — requires juggling a script, a voice, stock footage, timing, cuts, and rendering. Each step is a context switch. By the time you've stitched it together, the original idea has lost its energy.

// what I built

A pipeline where you describe what you want and get back an edited video. The agent writes the script, sends it to ElevenLabs for voiceover, aligns the audio timestamps, pulls free stock footage via MCP, and feeds everything into Remotion for rendering. For generative sequences, Veo3 produces the visuals. Human involvement is optional.

// how it works

The key was timestamp alignment — syncing the voiceover audio to visual cuts without a human editor. ElevenLabs returns word-level timestamps, which the pipeline uses to determine cut points. Remotion then renders the timeline programmatically. MCP connects to stock libraries so the agent can search and pull assets autonomously. Claude Code orchestrates the whole thing, and OpenClaw handles the task queue and retries.

// result

Script to rendered video with minimal human input
Word-level audio-to-visual sync via ElevenLabs timestamps
Autonomous stock footage sourcing via MCP
Generative video sequences via Veo3
Full pipeline: prompt → script → voice → edit → render

the stack

Claude CodeOpenClawRemotionElevenLabsVeo3MCP

← all projects discuss this build →