Synchronous text-to-speech synthesis

Generate speech synchronously from text or SSML. Returns complete audio as base64.

Authentication

AuthorizationBearer
API token from https://app.resemble.ai/account/api

Request

This endpoint expects an object.
voice_uuidstringRequired
Voice UUID to use for synthesis
datastringRequired

Text or SSML to synthesize (max 3,000 characters)

project_uuidstringOptional
Optional project UUID to store the clip
titlestringOptional
Optional title for the generated clip
modelstringOptional

Model to use for synthesis. Pass chatterbox-turbo to use the Turbo model for lower latency and paralinguistic tag support. If not specified, defaults to Chatterbox or Chatterbox Multilingual based on the voice. Note - Chatterbox-Turbo is supported by all Rapid English voices and Pre Built Library voices.

precisionenumOptionalDefaults to PCM_32
Audio precision for WAV output
Allowed values:
output_formatenumOptionalDefaults to wav
Audio output format
Allowed values:
sample_rateenumOptional
Audio sample rate in Hz
use_hdbooleanOptionalDefaults to false

Enable HD synthesis with small latency trade-off

Response

Successful synthesis
successboolean or null
audio_contentstring or nullformat: "byte"

Base64-encoded audio bytes

audio_timestampsobject or null
durationdouble or null
Audio duration in seconds
synth_durationdouble or null
Raw synthesis time
output_formatstring or null
sample_rateinteger or null
titlestring or null
issueslist of strings or null

Errors