12 — Long-Form Video Pipeline

The Long-Form Pipeline is designed for creating multi-segment videos from a single long audio recording — for example, turning a 30-minute podcast into a series of talking-head clips, or generating a full-length video course from a recorded script.

Access this feature under Long-Form in the left panel or by clicking + Long-Form in the toolbar.

Requirements: Replicate API key, cloud storage configured (for reference image uploads).

Step 1: Upload Audio

Click Upload Audio and select your source audio file (WAV, MP3, M4A).
A waveform preview appears with playback controls.
Review the audio to confirm it’s the correct file before proceeding.
Click Next.

Step 2: Segment

Break the audio into chunks that will each become a separate video clip.

Auto-Segmentation

Click Auto-Segment to let the app analyze the audio and propose logical split points.

Configuration options:

Option	Description
Target duration	Desired length per segment (seconds)
Segment variance tolerance	How much segments can deviate from the target duration
Prefer silence detection	When enabled, splits are placed at quiet moments rather than at fixed intervals
Silence threshold (dB)	Audio level below which a moment is classified as “silence”

Manual Adjustment

After auto-segmentation, you can:
– Drag segment boundaries on the waveform to move split points
– Add a split point by clicking on the waveform where you want a new break
– Remove a split point by clicking the × on any existing marker
– Preview any segment by clicking the play button on that segment’s entry

When your segments look correct, click Next.

Step 3: Generate

This step sends all segments to an AI video generation model simultaneously.

Reference Image

Upload a portrait image of the person who will appear in the talking-head videos. This same image is used for all segments in the batch.

Follow the same best practices as in Talking Head: clear, forward-facing, well-lit, plain background.
The image is uploaded to your configured cloud storage so the AI model can access it.

Model Selection

Choose from models optimized for long-form batch generation:

Model	Notes
MultiTalk	Reliable audio-driven lip sync
OmniHuman	High realism, body crop options
Seedance	Fast generation
Runway Gen 4.5	Top quality
Sora 2	High quality
Veo 3.1	Excellent motion
Kling V3 / Kling V3 Omni	Strong realism
Hailuo 2.3	Cost-efficient
(and more)	13+ models total

Prompt

Write a scene description that applies to all segments. For example:

“Professional presenter in a modern office, natural lighting, looking directly at camera”

Optional Settings

Setting	Description
Frame continuity	Uses the last frame of each generated clip as the first frame of the next — produces seamless transitions between segments
Turbo mode	Requests faster generation (may reduce quality on some models)

Starting Generation

Click Generate All. All segments are submitted to the job queue simultaneously. A progress indicator shows the generation status per segment (Submitted → Running → Succeeded / Failed).

You can monitor progress on this screen or navigate to Job Queue to see the full list.

Step 4: Merge

When all (or most) segments have completed:

The completed video clips are listed with preview thumbnails.
Review each clip — if any failed, you can regenerate that segment individually.
Click Merge.
The app concatenates all clips in order, syncing the original audio track automatically.
Transitions are applied between clips (using the same transition settings as the main editor).
The merged video is exported as an MP4 to your configured export folder.

Tips for Long-Form Projects

Keep segments under 30 seconds. Most models perform better at shorter durations. For 5-minute sections, the pipeline will automatically chunk further if needed.
Use consistent lighting in your reference image. Since all segments use the same portrait, consistency in the reference photo produces a more cohesive final video.
Expect some failed segments. AI generation at scale can have occasional failures. The merge step lets you regenerate only the failed ones before merging.
Frame continuity works best with talking-head models. For image/video generation models, it may look disjointed since the “last frame” of one clip becomes the “first frame” of a very different generated clip.

Previous: AI Audio & Music | Next: Job Queue →