Forced Alignment API

Precision forced alignmentsubtitle generationword-level timingaudio synchronizationtranscript mapping for your audio.

VocaSync delivers true forced alignment—not ASR. Your transcript in, your transcript out with word-level timestamps. 15 languages, SRT/VTT/JSON output, deterministic precision—no hallucinations, no omissions. Accessible through our dashboard or REST API, with workflow paths from synthesis and transcription into alignment.

Word-Level Precision
15 Languages
SRT, VTT, JSON
Alignment previewOnline
Precision
0ms
Languages
0

Core Capabilities

Four core services, one unified API

Generate speech from text, align audio with transcripts, transcribe audio to text, or translate timed subtitles — VocaSync handles each with precision, and bundles them via Workflows when you need them chained.

Speech Synthesis

Text-to-Speech Generation

Convert any text into natural, human-like speech in 57 languages. Choose from 9 distinct AI voices with different tones and personalities. Export to MP3, AAC, OPUS, FLAC, or WAV formats.

57 LanguagesHD QualitySynthesise Align

Starting at

£0.03 per 1,000 characters

Forced Alignment

Transcript ➔ Precision Timing

Not ASR. Your transcript in = your transcript out. Acoustic alignment gives you word-level timestamps with deterministic precision. No hallucinations, no omissions.

Word-Level15 LanguagesSRT / VTT / JSON

Starting at

£0.02 per minute of audio

Transcription

Audio-to-Text with WhisperX

Convert audio to text using state-of-the-art WhisperX AI. Get segment-level timestamps, review/edit your transcript, then chain into alignment when needed.

99 LanguagesSegment Timestamps

Starting at

£0.01 per minute of audio

Translation

Timed Subtitles ➔ Translated

AI translation that preserves timestamps. Cue-by-cue output in SRT, VTT, or JSON. Optional Premium AI Refinement via Claude Sonnet 4.6 for register-faithful tone.

Neural translationCPS-boundedAI Refinement

Starting at

£0.080 per 1,000 source chars

Need stages chained? Workflows bundle multiple services with single upfront pricing and refund-on-failure.

Voice Library

9 unique AI voices for every use case

From authoritative to friendly, calm to energetic—find the perfect voice for your content.

alloy
ash
coral
echo
fable
onyx
nova
sage
shimmer

Why VocaSync

Built for developers, designed for scale

API-First

Developer-friendly REST API

Simple HTTP endpoints with JSON responses. Integrate in minutes with any language or framework. Full OpenAPI spec included.

Flexible

Pay only for what you use

No subscriptions, no commitments. Top up your balance and only pay for actual usage. Generous free tier included.

Global

Multi-language support

Generate speech in 57 languages, transcribe in 99, and align in a supported 15-language subset with clear workflow gating.

Quality

Studio-grade output

HD audio quality with multiple format options. Export to MP3, AAC, OPUS, FLAC, or uncompressed WAV.

Fast

Low latency processing

Real-time synthesis with sub-300ms latency. Batch processing for large projects with parallel execution.

Reliable

99.9% uptime SLA

Enterprise-grade infrastructure with automatic failover. Real-time status monitoring and incident alerts.

Developer Experience

Integrate in minutes, not days

Our REST API is designed for simplicity. Just a few lines of code to start generating speech or aligning audio. No complex SDKs to install—use any HTTP client in any language.

  • RESTful endpoints with JSON payloads
  • Bearer token authentication
  • Comprehensive error messages
  • OpenAPI 3.0 specification
// Text-to-Speech Synthesis
const response = await fetch("https://api.vocasync.io/v1/synthesis", {
  method: "POST",
  headers: {
    "Authorization": "Bearer voca_your_api_key",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    text: "Hello, welcome to VocaSync!",
    voice: "nova",
    format: "mp3"
  })
});

const audioBlob = await response.blob();
REST API • JSON responses
Live99.9% uptime

Use Cases

Powering voice across industries

E-Learning

Create accessible educational content with narrated lessons and synchronized captions.

  • Course narration
  • Multi-language support
  • Subtitle generation

Media & Podcasts

Automate transcription and generate voiceovers for podcasts, videos, and broadcasts.

  • Podcast intros/outros
  • Video narration
  • Transcript review + alignment

Accessibility

Make your content accessible with audio versions and accurate captions.

  • Screen reader alternatives
  • Caption generation
  • Audio descriptions

Output Formats

Export your audio in the format that works best for your workflow. From compressed streaming formats to lossless studio quality.

mp3aacopusflacwav

Global Language Support

57 languages for speech synthesis—speak to the world in their native tongue.

15 languages for forced alignment with word-level precision:

🇺🇸🇬🇧🇫🇷🇩🇪🇪🇸🇵🇹🇸🇪🇨🇿🇵🇱🇹🇷🇷🇺🇺🇦🇯🇵🇰🇷🇨🇳
Pay As You Go

Simple, transparent pricing

Only pay for what you use

Speech Synthesis

£0.03 / 1K chars

Forced Alignment

£0.02 / minute

Free tier included— 10 min alignment + 1K chars synthesis/month

Transcription is pay-as-you-go at £0.01/min (no free tier).

Get Started

Ready to add voice to your app?

Create your free account and get your API key in seconds. Start with our generous free tier—just add a payment method to unlock.

Free tier includedPay only for what you useEnterprise pricing available