Platform Features

Powerful voice tools for
modern applications

From text-to-speech synthesis to precision audio alignment and transcription, VocaSync provides production workflow paths that feed into deterministic alignment outputs.

Get Started Free View API Docs

AI Voices

Unique personalities

Languages

Transcription languages

Formats

Audio outputs

Latency

<300ms

Real-time speed

True Forced Alignment

Not ASR. Not guessing. True forced alignment powered by Montreal Forced Aligner—your transcript in, your transcript out. Get word-level timestamps with deterministic precision.

Your transcript = your output (no hallucinations or omissions)
G2P handles names, brands, and OOV words automatically
Deterministic: same input always produces same output
15 supported languages with specialized acoustic models
Phoneme-level alignment for advanced use cases
Complete bundle: alignment JSON, SRT & WebVTT included

Subtitle Preview

00:00:01,000 --> 00:00:03,500

Welcome to VocaSync, your AI voice platform.

00:00:04,000 --> 00:00:07,200

Generate natural speech and perfect subtitles.

Performance

Measured, not estimated

Real benchmarks from production workloads. Long files are aligned in parallel across segments, so processing stays fast and predictable as length grows — without sacrificing accuracy.

Alignment Speed

~16×

Faster than realtime

Example Run

66 min

Aligned in ~4 min

Start Latency

<5s

Queue to processing

Throughput

1000+

Audio min/hour per node

Benchmarks measured on long-form English narration (66 min audiobook chapter).
Performance may vary based on audio quality, language, and content complexity.

Text-to-Speech Synthesis

Transform any text into natural, human-like speech using state-of-the-art AI models. Perfect for narration, accessibility, and interactive applications.

57 languages supported worldwide
9 distinct AI voices with unique personalities
High Fidelity audio quality option
5 output formats: MP3, AAC, OPUS, FLAC, WAV
Intelligent text chunking for long content
Batch processing for large-scale generation

Sample Output

00:00Voice: Nova • Format: MP300:12

Audio Transcription (ASR)

Convert audio to text with state-of-the-art WhisperX AI, then review and edit before generating linked forced alignment outputs.

99 languages with auto-detection
State-of-the-art WhisperX engine
Segment-level timestamps included
TXT and JSON output formats
Large file support up to 100MB
Transcribe ➔ Review ➔ Align workflow
Workflow actions are gated to alignment-supported languages
Original transcription artifacts remain unchanged

Transcription Output

{ "segments": [

 { "start": 0.0, "end": 3.5,

 "text": "Welcome to the podcast." },

 { "start": 4.0, "end": 7.2,

 "text": "Today we discuss AI." }

] }

Translation

A primary service in its own right. Translate timestamped subtitle artifacts with a neural translation engine while preserving cue timing verbatim. Optional Premium AI Refinement layers register-faithful tone polish on top — useful for tonal content like anime, drama, or branded voice.

Neural translation across 28 target languages
SRT, VTT, and alignment.json inputs
Cue start/end times preserved verbatim
Optional Premium AI Refinement (Claude Sonnet 4.6)
Register presets: formal, casual, anime-faithful, drama-faithful
CPS-aware cue text for readability
Use standalone, or as the final stage of the Subtitling Workflow

Translated SRT

00:00:01,200 --> 00:00:03,400

It's good to see you again.

↓ Translate + Refinement

00:00:01,200 --> 00:00:03,400

また会えて嬉しいよ。

Subtitling Workflows

Audio in, translated subtitles out. The Subtitling Workflow bundle runs transcription ➔ alignment ➔ translation as one coordinated job, billed once per minute: £0.08/min Standard or £0.09/min Premium with Sonnet refinement bundled.

£0.08/min Standard covers transcription + alignment + translation
£0.09/min Premium adds Sonnet refinement to the bundle (no separate charge)
Auto-progress or pause for transcript review before alignment
Each stage's project remains independently inspectable
Auto-refund of unconsumed bundle on partial failure
30-day idle expiry with automatic refund

Pipeline Stages

01Transcription — ASR with the source language forced.
02Alignment — true forced alignment for word-level timestamps.
03Translation — per-cue translation with timestamps preserved.

Official Plugins

Plug-and-play integrations

Add Valeon-style word highlighting to your website with zero configuration. Automated synthesis, alignment, and secure hosting—all in one plugin.

Astro Integration

Available

Automated content synthesis & alignment
Audio player component with word highlighting
Secure CDN hosting for audio artifacts
Synthesis in 57 languages; highlighting in 15. Math-to-speech support

npm GitHub

WordPress Plugin

Available

One-click audio for any post or page
Gutenberg block with live preview
Secure CDN hosting for audio artifacts
Valeon-style word highlighting toggle

Download GitHub

Currently being submitted to the WordPress.org plugin directory — install from the GitHub release for now.

View All Integrations

Voice Library

9 unique AI voices for every scenario

Each voice has its own personality and tone. From warm and friendly to authoritative and professional—find the perfect voice for your content.

alloyNeutral and balanced

ashSoft and refined

coralWarm and friendly

echoSmooth and calm

fableExpressive and dramatic

onyxDeep and authoritative

novaBright and energetic

sageWise and measured

shimmerClear and optimistic

Global Reach

Worldwide language coverage

57 languages for speech synthesis—speak to the world. 15 languages for forced alignment with word-level precision:

🇺🇸English (US)

🇬🇧English (UK)

🇫🇷French

🇩🇪German

🇪🇸Spanish

🇵🇹Portuguese (PT)

🇸🇪Swedish

🇨🇿Czech

🇵🇱Polish

🇹🇷Turkish

🇷🇺Russian

🇺🇦Ukrainian

🇯🇵Japanese

🇰🇷Korean

🇨🇳Mandarin (CN)

Output Formats

5 audio formats for every use case

From compressed streaming formats to lossless studio quality—export in the format that fits your workflow.

mp3Universal compatibility

aacApple optimized

opusWeb streaming

flacLossless audio

wavRaw quality

Format	Type	Best For	File Size
MP3	Lossy	Universal compatibility, web playback	Small
AAC	Lossy	Apple devices, mobile apps	Small
OPUS	Lossy	Web streaming, real-time applications	Smallest
FLAC	Lossless	Archival, professional editing	Medium
WAV	Uncompressed	Studio production, maximum quality	Large

Developer Experience

API-first design for seamless integration

Our REST API is designed for developers. Simple HTTP endpoints, clear documentation, and predictable responses. Integrate VocaSync into any application in minutes.

REST API

Standard HTTP endpoints

Bearer Auth

Simple token authentication

OpenAPI Spec

Full API documentation

Webhooks

Real-time notifications

API Documentation Get API Key

// Text-to-Speech Synthesis
const response = await fetch("https://api.vocasync.io/v1/synthesis", {
  method: "POST",
  headers: {
    "Authorization": "Bearer voca_your_api_key",
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    text: "Hello, welcome to VocaSync!",
    voice: "nova",
    format: "mp3"
  })
});

const audioBlob = await response.blob();

REST API • JSON responses

Live99.9% uptime

More Features

Built for production workloads

Real-time Dashboard

Monitor usage, costs, and performance in real-time with our intuitive dashboard.

Project Organization

Organize your work into projects with metadata, tags, and easy search.

Persistent Storage

Your files are stored until you delete them. No automatic expiry.

Secure Infrastructure

All data encrypted in transit and at rest. Files stored until you delete them.

Usage Analytics

Detailed analytics on API usage, voice distribution, and cost breakdown.

Priority Support

Email support with fast response times for all paying customers.

Ready to add voice to your app?

Start with our free tier—add a payment method to unlock. Upgrade when you need more.

Create free account View pricing

Powerful voice tools formodern applications