Streaming text-to-speech synthesis (HTTP) | Resemble

Stream audio as it's generated. Returns chunked WAV data for progressive playback.

Authentication

AuthorizationBearer

API token from https://app.resemble.ai/account/api

Request

This endpoint expects an object.

voice_uuidstringRequired

Voice UUID to use for synthesis

datastringRequired

Text or SSML to synthesize (max 2000 characters)

project_uuidstringOptional

Optional project UUID to store the clip

modelstringOptional

Model to use for synthesis. Pass chatterbox-turbo to use the Turbo model for lower latency and paralinguistic tag support. If not specified, defaults to Chatterbox or Chatterbox Multilingual based on the voice. Note - Chatterbox-Turbo is supported by all Rapid English voices and Pre Built Library voices.

precisionenumOptionalDefaults to PCM_32

Audio precision

Allowed values:

sample_rateenumOptional

Audio sample rate in Hz

use_hdbooleanOptionalDefaults to false

Enable HD synthesis with small latency trade-off

Response

Streaming audio response (chunked WAV)

1	curl -X POST https://f.cluster.resemble.ai/stream \
2	-H "Authorization: Bearer <token>" \
3	-H "Content-Type: application/json" \
4	-d '{
5	"voice_uuid": "string",
6	"data": "string"
7	}'

Authentication

Request

Response

Errors