SSML Reference

Resemble accepts Speech Synthesis Markup Language (SSML) so you can precisely control pronunciation, pacing, and style. Punctuation is handled automatically, but SSML tags unlock finer control such as emphasizing a word, spelling out acronyms, or inserting pauses.

Supported Elements

ElementRequiredSummary
<speak>YesRoot element for all SSML input.
<prosody>NoAdjust pitch, rate, or volume for a span of text.
<emphasis>NoApply preset emphasis levels (internal pitch/volume mix).
<say-as>NoInterpret content as characters, numbers, etc.
<sub>NoSubstitute alternate text for pronunciation.
<break>NoInsert a timed pause.
<lang>NoSwitch the language mid-utterance (voice must support it).
<resemble:convert>NoPerform speech-to-speech conversion against an input file.

<speak> Root Element

1<speak prompt="string" version="1.1"></speak>
AttributeDescription
versionSSML spec version. Defaults to 1.1.
xml:langLocale for the root document (e.g. en, en-US).
xmlnsXML namespace. Defaults to http://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/synthesis.xsd.
temperatureControls generation randomness (0.1–5.0). Defaults to 0.8.
exaggerationEmotion intensity (0.0–1.0).
seedOptional deterministic seed (non-negative integer).
promptPrimer text to steer delivery (e.g. "Speak with excitement").

<prosody>

Use to modulate pitch, speaking rate, or volume.

1<speak>
2 This part is normal.
3 <prosody pitch="x-high">This part is going to sound high pitched.</prosody>
4 <prosody rate="150%">This part is going to be spoken fast.</prosody>
5 <prosody volume="loud">And this part is loud.</prosody>
6</speak>

AttributeDescription
pitchRelative pitch (e.g. +2st, x-high).
rateSpeaking rate as percentage or keywords (slow, fast).
volumeRelative volume (silent, x-soft, loud, etc.).

<emphasis>

1<speak>
2 <emphasis level="reduced">I'm more of a shy person really.</emphasis>
3</speak>

AttributeDescription
levelstrong, moderate, reduced, or none (default).

<say-as>

Spell out characters or indicate alternate interpretation.

1<speak>
2 This <say-as interpret-as="characters">SSML</say-as> stuff is really cool.
3</speak>

AttributeDescription
interpret-asSupported type: characters (spells each character).

<sub>

1<speak>Hi <sub alias="Joe">Jim</sub>, we are calling today to inform you of your account activation with Resemble.</speak>

AttributeDescription
aliasReplacement text spoken instead of the enclosed text.

<break>

1<speak>This is going to be a long <break time="2s"/>pause.</speak>

AttributeDescription
timeDuration of the pause (e.g. 500ms, 2s).

<lang>

Switch the language mid-stream if the voice supports it.

1<speak>
2 Su vuelo a <lang xml:lang="en-US">Pearson International Airport</lang> partirá en 30 minutos.
3</speak>

AttributeDescription
xml:langBCP 47 locale (see table below).
LanguageCode
Afrikaans – South Africaaf-za
Amharic – Ethiopiaam-et
Arabic – United Arab Emiratesar-ae
Arabic – Egyptar-eg
Arabic – Iraqar-iq
Arabic – Kuwaitar-kw
Arabic – Moroccoar-ma
Arabic – Qatarar-qa
Arabic – Saudi Arabiaar-sa
Azerbaijani – Azerbaijanaz-az
Bulgarian – Bulgariabg-bg
Bengali – Bangladeshbn-bd
Bengali – Indiabn-in
Bosnian – Bosniabs-ba
Catalan – Spainca-es
Mandarin – Chinacmn-cn
Czech – Czech Republiccs-cz
Danish – Denmarkda-dk
German – Germanyde-de
Greek – Greeceel-gr
English – Australiaen-au
English – Canadaen-ca
English – United Kingdomen-gb
English – Hong Kongen-hk
English – Irelanden-ie
English – Indiaen-in
English – Kenyaen-ke
English – New Zealanden-nz
English – Singaporeen-sg
English – United Statesen-us
English – South Africaen-za
Spanish – Argentinaes-ar
Spanish – Chilees-cl
Spanish – Colombiaes-co
Spanish – Costa Ricaes-cr
Spanish – Cubaes-cu
Spanish – Dominican Republices-do
Spanish – Ecuadores-ec
Spanish – Spaines-es
Spanish – Mexicoes-mx
Spanish – Perues-pe
Spanish – Puerto Ricoes-pr
Spanish – Paraguayes-py
Spanish – United Stateses-us
Spanish – Venezuelaes-ve
Estonian – Estoniaet-ee
Basque – Spaineu-es
Persian – Iranfa-ir
Finnish – Finlandfi-fi
Filipino – Philippinesfil-ph
French – Belgiumfr-be
French – Canadafr-ca
French – Switzerlandfr-ch
French – Francefr-fr
Irish – Irelandga-ie
Gujarati – Indiagu-in
Hebrew – Israelhe-il
Hindi – Indiahi-in
Croatian – Croatiahr-hr
Hungarian – Hungaryhu-hu
Armenian – Armeniahy-am
Indonesian – Indonesiaid-id
Icelandic – Icelandis-is
Italian – Italyit-it
Japanese – Japanja-jp
Javanese – Indonesiajv-id
Kazakh – Kazakhstankk-kz
Khmer – Cambodiakm-kh
Kannada – Indiakn-in
Korean – South Koreako-kr
Lithuanian – Lithuanialt-lt
Latvian – Latvialv-lv
Malayalam – Indiaml-in
Mongolian – Mongoliamn-mn
Marathi – Indiamr-in
Malay – Malaysiams-my
Maltese – Maltamt-mt
Burmese – Myanmarmy-mm
Norwegian – Norwaynb-no
Nepali – Nepalne-np
Dutch – Belgiumnl-be
Dutch – Netherlandsnl-nl
Punjabi – Indiapa-in
Polish – Polandpl-pl
Pashto – Afghanistanps-af
Portuguese – Brazilpt-br
Portuguese – Portugalpt-pt
Romanian – Romaniaro-ro
Russian – Russiaru-ru
Sinhala – Sri Lankasi-lk
Slovak – Slovakiask-sk
Slovenian – Sloveniasl-si
Somali – Somaliaso-so
Albanian – Albaniasq-al
Serbian – Serbiasr-rs
Swedish – Swedensv-se
Swahili – Kenyasw-ke
Tamil – Indiata-in
Tamil – Sri Lankata-lk
Tamil – Malaysiata-my
Telugu – Indiate-in
Thai – Thailandth-th
Turkish – Turkeytr-tr
Ukrainian – Ukraineuk-ua
Urdu – Pakistanur-pk
Vietnamese – Vietnamvi-vn
Chinese – Chinazh-cn
Chinese – Hong Kongzh-hk
Chinese – Taiwanzh-tw
Cantonese – Chinayue-cn
Zulu – South Africazu-za

<resemble:convert>

Performs speech-to-speech using a donor recording.

Maximum file size: 50 MB, maximum duration: 300 seconds. Files exceeding either limit are trimmed.

1<speak>
2 <resemble:convert src="https://storage.googleapis.com/resemble-ai-docs-public-files/resemble_sts_donor_audio.wav" />
3</speak>

Original Audio

Converted Audio


AttributeDescription
srcHTTPS URL pointing to a WAV file with a single speaker.
pitchOptional float between -10.0 and 10.0 to transpose the output.
promptPrimer text to steer delivery (e.g. "Speak in a British accent."). Note: For STS, place prompt on the <resemble:convert> tag, not on <speak>.

Prompting with Speech-to-Speech

Unlike text-to-speech where the prompt attribute is placed on the <speak> root element, for speech-to-speech conversion you must place the prompt attribute directly on the <resemble:convert> tag:

1<speak>
2 <resemble:convert
3 src="https://storage.googleapis.com/resemble-ai-docs-public-files/resemble_sts_donor_audio.wav"
4 prompt="Speak in a British accent." />
5</speak>

The prompt will guide how the donor audio is converted to the target voice, allowing you to adjust accent, tone, or delivery style.