Transcription

Turn spoken audio into text with WhisperX speech recognition across 99+ languages.

What it does

Transcription (automatic speech recognition) listens to your audio and produces a text transcript with segment-level timestamps. It’s the right starting point when you don’t already have a script for your audio.

Engine & languages

  • Powered by WhisperX, with support for 99+ languages.
  • Auto-detect the spoken language, or set it explicitly for best accuracy.
  • Large files (100MB+) are supported.

Improving accuracy

You can supply an optional transcription prompt — context such as names, acronyms, or domain terms — to nudge recognition toward the right spelling and vocabulary for your content.

Output formats

  • TXT — plain text transcript.
  • JSON — transcript with segment-level timestamps.

Transcript review before alignment

Recognition is never perfect. When transcription feeds a linked alignment — as in the Subtitling workflow — you can pause to review and correct the transcript before alignment runs.

Correct once, gain everywhere

Fixing the transcript at the review step means the alignment, and any downstream translation, all build on clean text.

Via the API

Transcribe from your own code with an API key. See the API reference for endpoints and presigned uploads.