07 — Animated Captions

07 — Animated Captions

HF AI Video Studio supports word-by-word animated captions — the same style used in viral short-form content. Each word highlights as it is spoken, driven by the timestamps generated during Voice Studio transcription.

Access the Captions panel from the left panel or by clicking + Captions in the toolbar.


How Captions Work

Captions are generated from the word-level transcript produced in Voice Studio → Transcribe. The transcript is broken into 3–6 word phrases, and each phrase is timed to match the audio.

During playback, each word within a phrase lights up as it is spoken. At export, captions are burned directly into the video using FFmpeg’s drawtext filter — they are not a separate subtitle track.

Note: You must complete at least the Transcribe step in Voice Studio before captions are available.


Animation Styles

Style Description
Pop Each word appears with a scaling pop effect as it is spoken
Karaoke Words fade in as they are spoken, fading the rest slightly
Typewriter Words appear letter-by-letter as they are spoken
None Static text; no animation

Select a style in the Captions panel. The preview updates in real time.


Text Styling

Setting Options
Font family System fonts available on your machine
Font size 24–120px
Font weight Normal, Bold, Extra Bold
Text transform Normal case or UPPERCASE
Max words per line 3–8 words per line before wrapping
Line spacing Controls vertical gap between lines
Word spacing Controls horizontal gap between words

Colors & Effects

Setting Description
Inactive word color Color of words that have not yet been spoken (e.g., light gray)
Active word color Color of the word currently being spoken (e.g., bright yellow or white)
Active word background Optional highlight box behind the active word
Stroke color Outline color drawn around all text
Stroke width Outline thickness from 0 (none) to 6px

Caption Position

Choose where captions appear on the video frame:

  • Bottom (default) — captions sit near the bottom of the frame
  • Center — captions are centered vertically
  • Top — captions appear near the top of the frame

Timing Sync (Caption Offset)

When captions are slightly early or late relative to the spoken audio, use the Caption Offset control to shift all caption timings forward or backward.

Control Effect
Offset slider Drag to adjust offset from -30s to +30s in 0.05s increments
-1s / -0.1s buttons Nudge captions earlier
+0.1s / +1s buttons Nudge captions later
Reset Return offset to 0

The offset affects both the live preview and the exported video.


Adjusting Individual Phrase Timing

On the Captions track in the timeline, each caption phrase appears as a block. You can:

  • Drag the left edge to move the phrase’s start point earlier or later
  • Drag the right edge to extend or shorten when the phrase ends

This is useful for fine-tuning phrases where the transcription timing was slightly off.


Previous: Voice Studio | Next: AI Image Generation →