07 — Animated Captions
HF AI Video Studio supports word-by-word animated captions — the same style used in viral short-form content. Each word highlights as it is spoken, driven by the timestamps generated during Voice Studio transcription.
Access the Captions panel from the left panel or by clicking + Captions in the toolbar.
How Captions Work
Captions are generated from the word-level transcript produced in Voice Studio → Transcribe. The transcript is broken into 3–6 word phrases, and each phrase is timed to match the audio.
During playback, each word within a phrase lights up as it is spoken. At export, captions are burned directly into the video using FFmpeg’s drawtext filter — they are not a separate subtitle track.
Note: You must complete at least the Transcribe step in Voice Studio before captions are available.
Animation Styles
| Style | Description |
|---|---|
| Pop | Each word appears with a scaling pop effect as it is spoken |
| Karaoke | Words fade in as they are spoken, fading the rest slightly |
| Typewriter | Words appear letter-by-letter as they are spoken |
| None | Static text; no animation |
Select a style in the Captions panel. The preview updates in real time.
Text Styling
| Setting | Options |
|---|---|
| Font family | System fonts available on your machine |
| Font size | 24–120px |
| Font weight | Normal, Bold, Extra Bold |
| Text transform | Normal case or UPPERCASE |
| Max words per line | 3–8 words per line before wrapping |
| Line spacing | Controls vertical gap between lines |
| Word spacing | Controls horizontal gap between words |
Colors & Effects
| Setting | Description |
|---|---|
| Inactive word color | Color of words that have not yet been spoken (e.g., light gray) |
| Active word color | Color of the word currently being spoken (e.g., bright yellow or white) |
| Active word background | Optional highlight box behind the active word |
| Stroke color | Outline color drawn around all text |
| Stroke width | Outline thickness from 0 (none) to 6px |
Caption Position
Choose where captions appear on the video frame:
- Bottom (default) — captions sit near the bottom of the frame
- Center — captions are centered vertically
- Top — captions appear near the top of the frame
Timing Sync (Caption Offset)
When captions are slightly early or late relative to the spoken audio, use the Caption Offset control to shift all caption timings forward or backward.
| Control | Effect |
|---|---|
| Offset slider | Drag to adjust offset from -30s to +30s in 0.05s increments |
| -1s / -0.1s buttons | Nudge captions earlier |
| +0.1s / +1s buttons | Nudge captions later |
| Reset | Return offset to 0 |
The offset affects both the live preview and the exported video.
Adjusting Individual Phrase Timing
On the Captions track in the timeline, each caption phrase appears as a block. You can:
- Drag the left edge to move the phrase’s start point earlier or later
- Drag the right edge to extend or shorten when the phrase ends
This is useful for fine-tuning phrases where the transcription timing was slightly off.
Previous: Voice Studio | Next: AI Image Generation →