Synchronous | Resemble | Documentation

The synchronous endpoint processes the entire input and returns a single audio payload—ideal for short prompts, notifications, or background jobs that do not require streaming.

Quick Example

$ curl --request POST "https://f.cluster.resemble.ai/synthesize" \
>   -H "Authorization: YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   -H "Accept-Encoding: gzip" \
>   --data '{
>     "voice_uuid": "55592656",
>     "data": "Hello from Resemble!",
>     "sample_rate": 48000,
>     "output_format": "wav"
>   }'

Decode the audio_content field from base64 to retrieve the raw audio bytes.

Endpoint

POST https://f.cluster.resemble.ai/synthesize

Request Headers

Header	Value	Description
`Authorization`	`YOUR_API_KEY`	API key from the dashboard.
`Content-Type`	`application/json`	JSON request body.
`Accept-Encoding`	`gzip`, `deflate`, or `br`	Optional compression.

Request Body

Field	Type	Required	Description
`voice_uuid`	string	Yes	Voice to synthesize.
`data`	string	Yes	Text or SSML to synthesize (≤ 2,000 characters).
`project_uuid`	string	No	Project to store the clip in.
`title`	string	No	Title for the generated clip.
`model`	string	No	Model to use for synthesis. Pass `chatterbox-turbo` to use the Turbo model for lower latency and paralinguistic tag support. If not specified, defaults to Chatterbox or Chatterbox Multilingual based on the voice. Note: Chatterbox-Turbo is supported by all Rapid English voices and Pre Built Library voices.
`precision`	string	No	`MULAW`, `PCM_16`, `PCM_24`, `PCM_32` (default). Applies to WAV output.
`output_format`	string	No	`wav` (default) or `mp3`.
`sample_rate`	number	No	`8000`, `16000`, `22050`, `32000`, `44100`, or `48000`. Defaults to `48000`.
`use_hd`	boolean	No	Enables higher-definition synthesis with a small latency trade-off. Defaults to `false`.

Response

1 {
2   "audio_content": "<base64>",
3   "audio_timestamps": {
4     "graph_chars": ["H", "e", "l", "l", "o"],
5     "graph_times": [[0.0, 0.12], [0.12, 0.24], ...],
6     "phon_chars": [],
7     "phon_times": []
8   },
9   "duration": 1.68,
10   "issues": [],
11   "output_format": "wav",
12   "sample_rate": 48000,
13   "seed": 962692783,
14   "success": true,
15   "synth_duration": 1.64,
16   "title": null
17 }

Field	Type	Description
`audio_content`	string	Base64-encoded audio bytes.
`audio_timestamps`	object	Timestamp arrays for graphemes and phonemes. Grapheme timestamps (`graph_chars`, `graph_times`) are supported for all models, with times in seconds as `[start, end]` pairs. Phoneme timestamps (`phon_chars`, `phon_times`) return empty arrays for newer models and are only populated by legacy models.
`duration`	number	Final clip duration in seconds.
`issues`	array	Issues related to the request.
`output_format`	string	Echoes the requested format.
`sample_rate`	number	Echoes the requested sample rate.
`seed`	number	Random seed used for generation.
`success`	boolean	Whether the synthesis succeeded.
`synth_duration`	number	Raw synthesis time prior to post-processing.
`title`	string \| null	Title saved with the clip, or `null` if not provided.

Try it – Repeat the request above and decode audio_content locally:

$ curl --request POST "https://f.cluster.resemble.ai/synthesize" \
>   -H "Authorization: YOUR_API_KEY" \
>   -H "Content-Type: application/json" \
>   --data '{
>     "voice_uuid": "55592656",
>     "data": "Hello from Resemble!",
>     "sample_rate": 48000,
>     "output_format": "wav"
>   }' \
> | jq -r '.audio_content' | base64 --decode > output.wav