Support

Frequently Asked Questions

Find answers to common questions about VocaSync's text-to-speech synthesis, forced alignment, transcription, workflow paths, and pricing.

General

What is VocaSync?

VocaSync is an AI-powered platform that provides text-to-speech synthesis, forced alignment, and audio transcription (ASR) services via a simple REST API. It helps you generate studio-quality audio from text, create precise word-level timestamps for subtitle generation, and convert audio to text with segment-level timestamps.

Who is VocaSync for?

VocaSync is designed for developers, content creators, educators, and businesses who need reliable audio processing. Common use cases include audiobook production, video subtitle generation, podcast transcription, e-learning content, accessibility features, and interactive voice applications.

Do I need to code to use VocaSync?

No! VocaSync offers both a web dashboard and REST API. You can create projects, upload files, and download results directly from the dashboard without writing any code. For automation and integration, the API is available with comprehensive documentation.

How do I create an account?

Visit the sign-up page and create an account using your email or social login. Once registered, you can access your dashboard to manage projects and API keys.

Is there a free tier?

Yes! New users with a valid payment method on file receive free credits to explore the platform. This allows you to test both synthesis and alignment features before purchasing additional credits.

Speech Synthesis

What languages are supported for speech synthesis?

Speech synthesis supports 57 languages including English, Spanish, French, German, Chinese, Japanese, Arabic, Portuguese, Russian, Hindi, and many more. See our API documentation for the complete list of supported languages.

How many AI voices are available?

VocaSync offers 9 distinct AI voices: Alloy (neutral and balanced), Ash (soft and refined), Coral (warm and friendly), Echo (smooth and calm), Fable (expressive and dramatic), Onyx (deep and authoritative), Nova (bright and energetic), Sage (wise and measured), and Shimmer (clear and optimistic).

What audio formats can I generate?

Synthesis supports 5 output formats: MP3 (universal compatibility), AAC (Apple optimized), OPUS (web streaming), FLAC (lossless audio), and WAV (uncompressed). Choose the format that best fits your workflow.

Can I adjust the speech speed?

Yes, you can control speech speed from 0.25x (very slow) to 4.0x (very fast). The default is 1.0x. This is useful for creating audiobooks at comfortable listening speeds or generating faster audio for time-constrained applications.

What is the difference between Standard and High Fidelity synthesis?

Standard synthesis is optimized for speed and lower latency, making it ideal for real-time applications and drafts. High Fidelity synthesis produces higher quality audio with better clarity and naturalness, best for final production content like audiobooks and podcasts. VocaSync uses OpenAI TTS for synthesis (Standard maps to tts-1, High Fidelity maps to tts-1-hd). See the API reference for technical details.

Forced Alignment

What is forced alignment and how is it different from ASR?

Forced alignment times the words YOU provide against audio—your transcript in, your transcript out. Unlike ASR (Automatic Speech Recognition), which guesses what was said and can hallucinate or miss words, forced alignment is deterministic: same input always produces the same output. This makes it ideal for production workflows where accuracy matters.

What languages are supported for forced alignment?

Forced alignment supports 15 languages with specialized acoustic models: English (US & UK), French, German, Spanish, Portuguese (Portugal), Swedish, Czech, Polish, Turkish, Russian, Ukrainian, Japanese, Korean, and Mandarin (China). Each language uses optimized models for accurate word-level timestamp generation.

What output formats are available for alignment?

Alignment outputs include JSON with word-level timestamps, SRT subtitles, and WebVTT captions. The JSON format includes precise start/end times for each word, perfect for programmatic use.

How accurate is forced alignment?

VocaSync uses MFA (Montreal Forced Aligner) with language-specific acoustic models, achieving word boundary accuracy within 20-50 milliseconds for clear audio. Accuracy depends on audio quality, background noise, and how closely the transcript matches the spoken content.

What are common use cases for forced alignment?

Forced alignment is widely used for: generating precise subtitles and captions, creating karaoke-style lyric displays, syncing audio with text in e-learning, building interactive audiobooks, and powering read-along applications where text highlights as audio plays.

Transcription (ASR)

What is transcription and how does it work?

Transcription (ASR) converts audio files to text using state-of-the-art WhisperX AI. Upload your audio file, specify the language (or use auto-detect), and receive a text transcript with segment-level timestamps. It's ideal for podcasts, interviews, lectures, and any audio content.

What languages are supported for transcription?

Transcription supports 99 languages including auto-detection. Major languages include English, Spanish, French, German, Chinese, Japanese, Arabic, Hindi, Portuguese, Russian, and many more. The WhisperX engine automatically detects the spoken language if not specified.

What audio formats are supported for transcription?

Transcription supports MP3, MP4, MPEG, MPGA, M4A, WAV, WebM, OGG, and FLAC formats. Files can be up to 100MB in size.

What output formats are available for transcription?

Transcription outputs plain text (TXT) and JSON with segment-level timestamps. For word-level timestamps and subtitle formats (SRT/VTT), run forced alignment on your edited transcript.

How accurate is the transcription?

VocaSync uses WhisperX, a state-of-the-art speech recognition model. Accuracy varies by audio quality, accent, and background noise. For best results, use clear audio recordings. You can also provide a prompt to guide the model with context about the content.

Is there a free tier for transcription?

Transcription is pay-as-you-go only at £0.01 per minute of audio (rounded up to the nearest 15 seconds, minimum 1 minute charge). Unlike synthesis and alignment, there is no free tier for transcription. This ensures sustainable, high-quality transcription services.

What is the difference between transcription and alignment?

Transcription (ASR) uses AI to guess what was said in audio—great for content you don't have a transcript for. Alignment takes your existing transcript and precisely timestamps each word against the audio. Use transcription when you need to discover what was said; use alignment when you already know and need precise timing.

Does transcription include speaker diarization?

Speaker diarization (labeling different speakers) is on our roadmap but not yet available. If this is a priority for your use case, contact sales@vocasync.io and let us know.

Workflows

How does synthesis → alignment work?

You can either enable auto-alignment during synthesis (for supported languages) or create alignment later from the synthesis project page. VocaSync creates a linked alignment project with copied inputs so both projects stay independent.

How does transcription → alignment work?

Open your completed transcription project, review/edit the transcript, then create a linked alignment project. This gives you AI-assisted transcription plus deterministic word-level alignment output.

What if my transcript doesn't match the audio?

Transcription is ASR-based, which is a best-match process and not guaranteed to be 100% accurate. Alignment requires the transcript to match the spoken audio closely, so use the manual transcript review editor before alignment. Helpful fixes include correcting brand names, coined terms, product names, proper nouns, and acronyms, then checking punctuation/casing and replaying uncertain segments before submitting.

What if my transcription language isn't supported by alignment?

Alignment supports a smaller language set than transcription. If a transcription is in an out-of-scope language, alignment actions are disabled for that project. If the language is supported, the review-to-alignment flow is available.

Can I edit non-English transcripts before alignment?

Yes. The transcript review editor supports all languages that VocaSync alignment supports, including non-English languages. You can make edits before creating the linked alignment project.

Does creating alignment modify my original transcription files?

No. VocaSync never mutates the original transcription TXT/JSON artifacts when you generate alignment. It creates copied alignment-owned inputs in a separate linked project, so each project remains independent.

What happens if a linked alignment project is deleted?

Projects are intentionally independent, so deleting one does not cascade-delete the other. If a linked alignment project is removed, VocaSync clears stale linkage references so the source project no longer points to a non-existent linked project.

Billing & Credits

How does pay-as-you-go pricing work?

You purchase credits in advance, which are consumed based on usage. Alignment and transcription are charged per minute of audio (rounded up to the nearest 15 seconds, minimum 1 minute). Synthesis is charged per 1,000 characters. There are no subscriptions or commitments.

Do credits expire?

No, credits never expire. Once purchased, they remain in your account until used, giving you complete flexibility in how you use the service.

Are credits refundable?

Credits are non-refundable once purchased. We recommend starting with the free tier to evaluate the service before purchasing additional credits.

What payment methods are accepted?

All payments are processed securely via Stripe. We accept major credit cards, debit cards, and other payment methods supported by Stripe in your region.

What currency are prices in?

All prices are displayed in GBP (British Pounds). Applicable taxes may apply based on your location.

API & Technical

How do I get an API key?

After signing in, navigate to your dashboard and go to the API Keys section. You can generate new API keys there. Keep your keys secure and never share them publicly.

What output formats are available?

Synthesis outputs audio in 5 formats: MP3, AAC, OPUS, FLAC, and WAV. Alignment outputs JSON, SRT (subtitles), and VTT (WebVTT captions) with word-level timestamps. Transcription outputs TXT and JSON with segment-level timestamps.

What are the API rate limits?

We apply fair-use rate limits to protect platform stability. Current limits are documented in the API reference. If you need higher throughput for production workloads, email sales@vocasync.io and we'll work with you to accommodate your needs.

How long are generated files stored?

Your files are stored until you delete them or close your account. There is no automatic expiry. You can delete files anytime from your dashboard. When you close your account, all associated files are permanently removed.

Can I use VocaSync for commercial projects?

Yes, VocaSync can be used for commercial projects. Review our Terms of Service for complete usage rights and acceptable use guidelines.

What audio formats do you accept for input?

For transcription and alignment, we accept most common audio formats including MP3, MP4, MPEG, MPGA, M4A, WAV, WebM, OGG, and FLAC. Files are auto-normalized via ffmpeg. Maximum file size is 100MB.

How do you handle punctuation, casing, and numbers?

Transcription uses AI to infer punctuation, casing, and number formatting. Results are generally good but not perfect. For alignment, the output matches your input transcript exactly. We recommend reviewing and editing transcripts before alignment for production use.

Can I automate VocaSync in CI/CD pipelines?

Yes! The REST API is designed for automation. Submit jobs programmatically, poll for completion (or use webhooks where available), and download results. Check the API documentation for examples and best practices.

Support & Security

How do I contact support?

For technical support, billing questions, or account help, email support@vocasync.io. We aim to respond within 24–48 business hours. For enterprise inquiries, volume pricing, or custom SLAs, contact sales@vocasync.io.

How is my data protected?

We take data security seriously. All data is transmitted over encrypted connections (HTTPS), and we follow industry best practices for data storage and handling. Review our Privacy Policy for complete details.

Can I delete my account and data?

Yes, you can request account deletion at any time by contacting support. We will remove your account and associated data in accordance with our Privacy Policy.

Notifications

What email notifications does VocaSync send?

VocaSync can send email notifications for: project completion (when your alignment, synthesis, or transcription job finishes), project failures (if something goes wrong), low balance alerts (when your credits drop below a threshold), API key activity (when keys are created, deleted, or status changes), and optional weekly usage digests summarizing your activity.

What email address will notifications come from?

All VocaSync notification emails are sent from notifications@vocasync.io. To ensure you receive our emails, add this address to your contacts or safe senders list. If you're not receiving notifications, check your spam or junk folder.

How do I configure my notification preferences?

Go to Settings in your dashboard and select the Notifications tab. From there, you can toggle each notification type on or off, set your preferred notification email address, and configure your low balance alert threshold.

Can I use a different email for notifications?

Yes! In the Notifications settings, you can specify any email address for receiving notifications. This doesn't have to be the same as your account email. You can change it anytime.

How do low balance alerts work?

When enabled, you'll receive an email notification when your account balance drops below your configured threshold (from £0.50 to £10.00). To prevent spam, you'll only receive one low balance alert per 24-hour period. Top up your account to continue uninterrupted service.

What is the weekly digest?

The weekly digest is an optional email summary of your VocaSync activity. It includes the number of projects completed, alignment minutes and synthesis characters used, and your current credit balance. It's a great way to stay informed about your usage patterns.

What are API key activity notifications?

API key activity notifications alert you when changes are made to your API keys. You'll be notified when a key is created, deleted, activated, or deactivated. This helps you stay aware of any security-related changes to your account, whether made by you or by someone with access to your dashboard.

Can I unsubscribe from all notifications?

Yes, you can disable all notification types individually from the Settings > Notifications page. Each notification type (project completed, project failed, low balance, weekly digest, API key activity) can be toggled independently. You can also click the 'Manage notification preferences' link in any email to update your settings.