Speech-to-Speech
Convert a donor recording into a target voice while preserving delivery and timing. Speech-to-speech uses the same synthesis endpoint as synchronous TTS, but you pass SSML that references a source recording.
Quick Example
Endpoint
POST https://f.cluster.resemble.ai/synthesize
Request Body
Response
Identical to synchronous TTS responses:
<resemble:convert> Attributes
Tip: Store donor files in cloud storage with signed URLs and revoke them after synthesis completes.
Prompting with Speech-to-Speech
You can use the prompt attribute to guide how the donor audio is converted to the target voice. Unlike text-to-speech where the prompt is placed on the <speak> root element, for speech-to-speech conversion you must place the prompt attribute directly on the <resemble:convert> tag.
Example with Prompt
The prompt attribute allows you to adjust:
- Accent or dialect (e.g. “Speak in a British accent”)
- Tone or emotion (e.g. “Speak with excitement”)
- Speaking style (e.g. “Speak in a formal tone”)
This provides fine-grained control over how the donor audio’s delivery is transformed while maintaining the original timing and prosody.
