Dub-Prep workflow

Generate narration and get the word-level timings to sync it — in one bundle.

What it chains

Dub-Prep runs two services back to back:

Speech synthesis — render your script as spoken audio.
Forced alignment — word- and segment-level timings for the audio just generated.

The result is narration plus a precise timing map — everything you need to line speech up against video for dubbing.

Quality tiers & estimation

The bundle is priced per minute in two tiers:

Standard Depth — standard synthesis quality.
High Fidelity — richer, premium synthesis quality.

Estimated from your text

Because you provide text rather than existing audio, the duration is estimated from character count to price the bundle upfront. Any stage that fails is refunded.

Via the API

Create Dub-Prep pipelines with an API key and presigned uploads, and follow progress with webhooks. See the API reference for the workflow endpoints, and the workflows overview for how bundles compare.