Resemble accepts Speech Synthesis Markup Language (SSML) so you can precisely control pronunciation, pacing, and style. Punctuation is handled automatically, but SSML tags unlock finer control such as emphasizing a word, spelling out acronyms, or inserting pauses.
<speak> Root Element<prosody>Use to modulate pitch, speaking rate, or volume.
<emphasis><say-as>Spell out characters or indicate alternate interpretation.
<sub><break><lang>Switch the language mid-stream if the voice supports it.
<resemble:convert>Performs speech-to-speech using a donor recording.
Maximum file size: 50 MB, maximum duration: 300 seconds. Files exceeding either limit are trimmed.
Unlike text-to-speech where the prompt attribute is placed on the <speak> root element, for speech-to-speech conversion you must place the prompt attribute directly on the <resemble:convert> tag:
The prompt will guide how the donor audio is converted to the target voice, allowing you to adjust accent, tone, or delivery style.