AI Studio

Choose natural voice output and control variable timing.

Voice and lip sync settings decide how personalized words sound, how the presenter mouth movement is handled, and how variables fit into the original video timing.

Model

What voice and lip sync controls

Voice sample
A generated audio option for a variable phrase.
Voice config
The saved per-variable selection used during render.
Lip sync
A render option that updates mouth movement when variable audio changes.
Variable start time
Manual timing used when a variable should play at a specific time.
Timing window
A start/end range with anchor and fit behavior for variable audio.

Voice picker

Pick the most natural voice for each variable

The voice picker is designed around variable phrases. For each variable, compare generated options and choose the one that sounds most natural in context.

Campaign voice cloning and manual recording controls
Campaign setup can use cloned voice output or manual recorded audio for variable phrases that need a more controlled sound.
  1. 01

    Open the variable voice settings

    Review variables that need generated voice output.
  2. 02

    Listen in context

    Choose the sample that matches surrounding words and pacing.
  3. 03

    Save the selection

    The selected voice configuration is saved with the template segment.
  4. 04

    Preview before launching

    Generate a small test row to confirm the selected voice works with real values.

Lip sync

When to enable lip sync

  • Enable lip sync when the presenter face is visible and variable audio changes mouth movement.
  • Disable lip sync when the variable is off-camera, covered by B-roll, or already fits a fixed audio slot.
  • Use shorter variables for more predictable facial and audio results.
  • Test with different values before using the template in a large campaign.

Manual timing can override lip sync needs

If every mapped variable has manual start timing, the campaign can use manual timing instead of requesting lip sync for those variables.

Timing

Manual timing windows

Manual timing windows are useful when the source recording has a predictable pause and the personalized word should fit into that window. The timing model supports start time, end time, anchor, and fit behavior.

anchor
Controls whether the timing window is anchored to the left or right side.
fit mode
Controls whether the renderer can adjust audio speed, video timing, or both.
start/end seconds
The exact window where the personalized audio should fit.

Checks

Quality checks

  • Use real sample values, not only placeholders.
  • Check pronunciation for names, companies, and product terms.
  • Verify mouth movement if the face is close to camera.
  • Confirm subtitles still match the rendered words when subtitles are enabled.
  • Keep template variables short and pause around them in the source recording.