personal_project
liveAgentic Video Editor
Describe a video, get an edited one back — script, voice, cuts and all.
// what problem this solves
Creating a short video — even a simple explainer — requires juggling a script, a voice, stock footage, timing, cuts, and rendering. Each step is a context switch. By the time you've stitched it together, the original idea has lost its energy.
// what I built
A pipeline where you describe what you want and get back an edited video. The agent writes the script, sends it to ElevenLabs for voiceover, aligns the audio timestamps, pulls free stock footage via MCP, and feeds everything into Remotion for rendering. For generative sequences, Veo3 produces the visuals. Human involvement is optional.
// how it works
The key was timestamp alignment — syncing the voiceover audio to visual cuts without a human editor. ElevenLabs returns word-level timestamps, which the pipeline uses to determine cut points. Remotion then renders the timeline programmatically. MCP connects to stock libraries so the agent can search and pull assets autonomously. Claude Code orchestrates the whole thing, and OpenClaw handles the task queue and retries.
// result
- Script to rendered video with minimal human input
- Word-level audio-to-visual sync via ElevenLabs timestamps
- Autonomous stock footage sourcing via MCP
- Generative video sequences via Veo3
- Full pipeline: prompt → script → voice → edit → render
the stack