10 — Talking Head / Lip-Sync Video
Talking Head generation takes an audio clip and a reference portrait image and produces a video of the person appearing to speak. It’s useful for creating avatar-driven explainer videos, spokesperson content, or AI presenter clips without recording on camera.
Access this feature under AI Generate → Talking Head.
A Replicate API key and cloud storage (for the reference image upload) are required.
Available Models
| Model | Best For | Max Duration |
|---|---|---|
| MultiTalk | High-quality audio-driven lip sync | Varies |
| OmniHuman (ByteDance) | Realistic full-body or portrait; 3 crop modes | Varies |
| Wan 2.2 S2V | Cost-efficient; good for short clips | Varies |
| Fabric 1.0 (VEED) | Up to 60 seconds; resolution control | 60 seconds |
Preparing Your Reference Image
The reference image is the face or body that will be animated. Quality here directly affects result quality.
Best practices:
– Use a clear, high-resolution photo (minimum 512×512px)
– The face should be well-lit, looking forward or slightly angled
– Avoid sunglasses, heavy shadows, or blurry photos
– A plain or simple background helps the model focus on the face
– Portrait framing (head and shoulders) works best for most models
OmniHuman-Specific Options
OmniHuman supports three crop modes that determine how much of the body is shown:
| Mode | Description |
|---|---|
| Portrait | Head and shoulders only |
| Half body | Torso visible |
| Full body | Full person visible |
Select the mode that matches your reference image framing.
Fabric 1.0 Options
| Setting | Description |
|---|---|
| Resolution | Choose standard or high-resolution output |
| Max duration | Up to 60 seconds per clip |
Generating a Talking Head Video
- In the Talking Head tab, select your model.
- Upload your reference image. The image is uploaded to your configured cloud storage (S3 or R2) to generate an accessible URL for the AI model. Make sure cloud storage is configured in Settings.
- Select or upload your audio. You can:
- Upload an audio file (MP3, WAV)
- Use a synthesized audio clip from Voice Studio
- Select an audio clip already on your timeline
- Configure any model-specific settings (body crop, resolution, duration).
- Click Generate.
- The job is added to the Job Queue. Talking head generation typically takes 1–4 minutes.
Using the Result
When the job completes:
– Download to save the MP4 to your computer
– Add to Timeline to insert the clip as a video segment
The resulting video is a lip-synced clip that can be trimmed, composited with other segments, and captioned like any other video segment.
Previous: AI Video Generation | Next: AI Audio & Music →