SSML Reference | Resemble

Resemble accepts Speech Synthesis Markup Language (SSML) so you can precisely control pronunciation, pacing, and style. Punctuation is handled automatically, but SSML tags unlock finer control such as emphasizing a word, spelling out acronyms, or inserting pauses.

Supported Elements

Element	Required	Summary
`<speak>`	Yes	Root element for all SSML input.
Inline tags	No	Insert vocal expressions such as pauses, laughs, or breaths at a specific point.
Wrapping tags	No	Modify the delivery style of an enclosed span of text.
`<say-as>`	No	Interpret content as characters, numbers, etc.
`<sub>`	No	Substitute alternate text for pronunciation.
`<lang>`	No	Switch the language mid-utterance (voice must support it).
`<resemble:convert>`	No	Perform speech-to-speech conversion against an input file.

`<speak>` Root Element

1 <speak version="1.1"></speak>

Attribute	Description
`version`	SSML spec version. Defaults to 1.1.
`xml:lang`	Locale for the root document (e.g. `en`, `en-US`).
`xmlns`	XML namespace. Defaults to `http://www.w3.org/TR/2010/REC-speech-synthesis11-20100907/synthesis.xsd`.
`temperature`	Controls generation randomness (0.1–5.0). Defaults to 0.8.
`exaggeration`	Emotion intensity (0.0–1.0).
`seed`	Optional deterministic seed (non-negative integer).

Inline Tags

Inline tags are placed at a specific point in the text to produce a vocal expression, such as a laugh, pause, or breath. They use square brackets and do not wrap any text.

1 <speak>
2   So I walked in and [pause] there it was. [laugh] I honestly could not believe it!
3 </speak>

Pauses

Tag	Description
`[pause]`	Insert a natural pause.
`[long-pause]`	Insert a longer pause.

Laughter & Crying

Tag	Description
`[laugh]`	A full laugh.
`[chuckle]`	A short, low laugh.
`[giggle]`	A light, playful laugh.
`[cry]`	A crying sound.

Mouth Sounds

Tag	Description
`[hum-tune]`	Hum a short tune.
`[tsk]`	A “tsk” sound of disapproval.
`[tongue-click]`	A tongue click.
`[lip-smack]`	A lip smack.

Breathing

Tag	Description
`[breath]`	An audible breath.
`[inhale]`	An audible inhale.
`[exhale]`	An audible exhale.
`[sigh]`	A sigh.

Place inline tags where the expression would naturally occur in conversation, and combine them with punctuation for the most natural results.

Wrapping Tags

Wrapping tags enclose a section of text to modify its delivery style, such as whispering or singing. Wrap complete phrases rather than individual words.

1 <speak>
2   I need to tell you something. <whisper>It is a secret.</whisper> Pretty cool, right?
3 </speak>

Volume & Intensity

Tag	Description
`<soft>`	Speak softly.
`<whisper>`	Whisper the enclosed text.
`<loud>`	Speak loudly.
`<build-intensity>`	Gradually increase intensity across the enclosed text.
`<decrease-intensity>`	Gradually decrease intensity across the enclosed text.
`<emphasis>`	Emphasize the enclosed text.

Pitch & Speed

Tag	Description
`<higher-pitch>`	Raise the pitch.
`<lower-pitch>`	Lower the pitch.
`<slow>`	Speak slowly.
`<fast>`	Speak quickly.

Vocal Style

Tag	Description
`<sing-song>`	A melodic, sing-song delivery.
`<singing>`	Sing the enclosed text.
`<laugh-speak>`	Speak while laughing.

Wrapping tags can be nested to combine styles:

1 <speak>
2   <slow><soft>Goodnight, sleep well.</soft></slow>
3 </speak>

`<say-as>`

Spell out characters or indicate alternate interpretation.

1 <speak>
2   This <say-as interpret-as="characters">SSML</say-as> stuff is really cool.
3 </speak>

Attribute	Description
`interpret-as`	Supported type: `characters` (spells each character).

`<sub>`

1 <speak>Hi <sub alias="Joe">Jim</sub>, we are calling today to inform you of your account activation with Resemble.</speak>

Attribute	Description
`alias`	Replacement text spoken instead of the enclosed text.

`<lang>`

Switch the language mid-stream if the voice supports it.

1 <speak>
2   Su vuelo a <lang xml:lang="en-US">Pearson International Airport</lang> partirá en 30 minutos.
3 </speak>

Attribute	Description
`xml:lang`	BCP 47 locale (see table below).

Supported xml:lang Codes

Language	Code
Afrikaans – South Africa	`af-za`
Amharic – Ethiopia	`am-et`
Arabic – United Arab Emirates	`ar-ae`
Arabic – Egypt	`ar-eg`
Arabic – Iraq	`ar-iq`
Arabic – Kuwait	`ar-kw`
Arabic – Morocco	`ar-ma`
Arabic – Qatar	`ar-qa`
Arabic – Saudi Arabia	`ar-sa`
Azerbaijani – Azerbaijan	`az-az`
Bulgarian – Bulgaria	`bg-bg`
Bengali – Bangladesh	`bn-bd`
Bengali – India	`bn-in`
Bosnian – Bosnia	`bs-ba`
Catalan – Spain	`ca-es`
Mandarin – China	`cmn-cn`
Czech – Czech Republic	`cs-cz`
Danish – Denmark	`da-dk`
German – Germany	`de-de`
Greek – Greece	`el-gr`
English – Australia	`en-au`
English – Canada	`en-ca`
English – United Kingdom	`en-gb`
English – Hong Kong	`en-hk`
English – Ireland	`en-ie`
English – India	`en-in`
English – Kenya	`en-ke`
English – New Zealand	`en-nz`
English – Singapore	`en-sg`
English – United States	`en-us`
English – South Africa	`en-za`
Spanish – Argentina	`es-ar`
Spanish – Chile	`es-cl`
Spanish – Colombia	`es-co`
Spanish – Costa Rica	`es-cr`
Spanish – Cuba	`es-cu`
Spanish – Dominican Republic	`es-do`
Spanish – Ecuador	`es-ec`
Spanish – Spain	`es-es`
Spanish – Mexico	`es-mx`
Spanish – Peru	`es-pe`
Spanish – Puerto Rico	`es-pr`
Spanish – Paraguay	`es-py`
Spanish – United States	`es-us`
Spanish – Venezuela	`es-ve`
Estonian – Estonia	`et-ee`
Basque – Spain	`eu-es`
Persian – Iran	`fa-ir`
Finnish – Finland	`fi-fi`
Filipino – Philippines	`fil-ph`
French – Belgium	`fr-be`
French – Canada	`fr-ca`
French – Switzerland	`fr-ch`
French – France	`fr-fr`
Irish – Ireland	`ga-ie`
Gujarati – India	`gu-in`
Hebrew – Israel	`he-il`
Hindi – India	`hi-in`
Croatian – Croatia	`hr-hr`
Hungarian – Hungary	`hu-hu`
Armenian – Armenia	`hy-am`
Indonesian – Indonesia	`id-id`
Icelandic – Iceland	`is-is`
Italian – Italy	`it-it`
Japanese – Japan	`ja-jp`
Javanese – Indonesia	`jv-id`
Kazakh – Kazakhstan	`kk-kz`
Khmer – Cambodia	`km-kh`
Kannada – India	`kn-in`
Korean – South Korea	`ko-kr`
Lithuanian – Lithuania	`lt-lt`
Latvian – Latvia	`lv-lv`
Malayalam – India	`ml-in`
Mongolian – Mongolia	`mn-mn`
Marathi – India	`mr-in`
Malay – Malaysia	`ms-my`
Maltese – Malta	`mt-mt`
Burmese – Myanmar	`my-mm`
Norwegian – Norway	`nb-no`
Nepali – Nepal	`ne-np`
Dutch – Belgium	`nl-be`
Dutch – Netherlands	`nl-nl`
Punjabi – India	`pa-in`
Polish – Poland	`pl-pl`
Pashto – Afghanistan	`ps-af`
Portuguese – Brazil	`pt-br`
Portuguese – Portugal	`pt-pt`
Romanian – Romania	`ro-ro`
Russian – Russia	`ru-ru`
Sinhala – Sri Lanka	`si-lk`
Slovak – Slovakia	`sk-sk`
Slovenian – Slovenia	`sl-si`
Somali – Somalia	`so-so`
Albanian – Albania	`sq-al`
Serbian – Serbia	`sr-rs`
Swedish – Sweden	`sv-se`
Swahili – Kenya	`sw-ke`
Tamil – India	`ta-in`
Tamil – Sri Lanka	`ta-lk`
Tamil – Malaysia	`ta-my`
Telugu – India	`te-in`
Thai – Thailand	`th-th`
Turkish – Turkey	`tr-tr`
Ukrainian – Ukraine	`uk-ua`
Urdu – Pakistan	`ur-pk`
Vietnamese – Vietnam	`vi-vn`
Chinese – China	`zh-cn`
Chinese – Hong Kong	`zh-hk`
Chinese – Taiwan	`zh-tw`
Cantonese – China	`yue-cn`
Zulu – South Africa	`zu-za`

`<resemble:convert>`

Performs speech-to-speech using a donor recording.

Maximum file size: 50 MB, maximum duration: 300 seconds. Files exceeding either limit are trimmed.

1 <speak>
2   <resemble:convert src="https://storage.googleapis.com/resemble-ai-docs-public-files/resemble_sts_donor_audio.wav" />
3 </speak>

Original Audio

Converted Audio

Attribute	Description
`src`	HTTPS URL pointing to a WAV file with a single speaker.
`pitch`	Optional float between -10.0 and 10.0 to transpose the output.
`prompt`	Primer text to steer delivery (e.g. `"Speak in a British accent."`). Place `prompt` directly on the `<resemble:convert>` tag.

Prompting with Speech-to-Speech

For speech-to-speech conversion, place the prompt attribute directly on the <resemble:convert> tag:

1 <speak>
2   <resemble:convert
3     src="https://storage.googleapis.com/resemble-ai-docs-public-files/resemble_sts_donor_audio.wav"
4     prompt="Speak in a British accent." />
5 </speak>

The prompt will guide how the donor audio is converted to the target voice, allowing you to adjust accent, tone, or delivery style.