Streaming text-to-speech synthesis (HTTP)

Stream audio as it's generated. Returns chunked WAV data for progressive playback.

Authentication

AuthorizationBearer
API token from https://app.resemble.ai/account/api

Request

This endpoint expects an object.
voice_uuidstringRequired
Voice UUID to use for synthesis
datastringRequired

Text or SSML to synthesize (max 2000 characters)

project_uuidstringOptional
Optional project UUID to store the clip
modelstringOptional

Model to use for synthesis. Pass chatterbox-turbo to use the Turbo model for lower latency and paralinguistic tag support. If not specified, defaults to Chatterbox or Chatterbox Multilingual based on the voice. Note - Chatterbox-Turbo is supported by all Rapid English voices and Pre Built Library voices.

precisionenumOptionalDefaults to PCM_32
Audio precision
Allowed values:
sample_rateenumOptional
Audio sample rate in Hz
use_hdbooleanOptionalDefaults to false

Enable HD synthesis with small latency trade-off

apply_custom_pronunciationsbooleanOptionalDefaults to false
When true, automatically applies your team's custom pronunciations to matching words in the input text. Defaults to false.

Response

Streaming audio response (chunked WAV)

Errors

400
Bad Request Error