Teach Resemble AI how to pronounce specific words by providing reference audio. Custom pronunciations are scoped per team and are automatically applied during speech synthesis when enabled.
When apply_custom_pronunciations is set to true in a synthesis request:
Custom pronunciations add negligible latency to synthesis requests.
apply_custom_pronunciations: true — Resemble will automatically detect matching words and apply your custom pronunciations.Custom pronunciations are currently supported for English (en-us) only. Support for additional languages used by the multilingual model will be added in future releases.
The feature works best with uncommon or domain-specific words (medical terms, brand names, proper nouns, technical jargon). If a custom pronunciation is created for a common word that the model already knows well, the model may favor its built-in pronunciation over the custom one. Custom pronunciations serve as guidance to the model, and common words have strong existing associations.
The pronunciation reference audio should be recorded in a voice that is similar to the voice used during synthesis. If the pronunciation audio speaker sounds very different from the synthesis voice, it may cause undesirable results.
If the accent of the custom pronunciation differs from the target voice accent, there is a risk of the model switching accents during synthesis. For best results, record pronunciation references using the same speaker or a speaker with a similar vocal quality and accent.
Pronunciation audio should contain a single, clear utterance of the word. Avoid background noise, music, or multiple words in one clip.
There is a limit of 2 pronunciations per team on the free tier. Contact sales for additional custom vocabulary capacity.
Each word must be unique within a team + language + domain combination. Attempting to create a duplicate returns a 422 Unprocessable Entity error. To change a pronunciation, delete the existing one and create a new one.
"pending" and transition to "ready" once processing is complete. Only "ready" pronunciations are applied during synthesis.active flag to temporarily disable a pronunciation without deleting it. This is useful for A/B testing or troubleshooting.Q: What happens if apply_custom_pronunciations is omitted from the request?
It defaults to false. Pronunciation lookup is skipped entirely with zero added latency.
Q: How long does processing take after uploading a pronunciation?
Typically a few seconds. Poll the pronunciation’s status via GET /api/v2/pronunciations/:uuid to check when it transitions from "pending" to "ready".
Q: What happens if something goes wrong during pronunciation lookup? The system fails gracefully. If custom pronunciations cannot be retrieved, synthesis proceeds normally without them. Your requests will never fail due to a pronunciation lookup issue.
Q: Does this work with all Resemble voices and models? Custom pronunciations are compatible with all Chatterbox voices, including standard and cloned voices.
Q: What if my pronunciation isn’t being applied? Check the following:
"ready"?active?"apply_custom_pronunciations": true set in your synthesis request?Q: Can I use multi-word phrases as pronunciations?
Yes. The system checks for both individual words and adjacent two-word phrases. For example, a pronunciation for "manmay nakhashi" will match when those two words appear next to each other in the input text.
Q: What happens if I delete a pronunciation — is it removed immediately? Yes. The pronunciation is removed immediately and subsequent synthesis requests will no longer use it.