Synchronous
The synchronous endpoint processes the entire input and returns a single audio payload—ideal for short prompts, notifications, or background jobs that do not require streaming.
Quick Example
$ curl --request POST "https://f.cluster.resemble.ai/synthesize" \ > -H "Authorization: YOUR_API_KEY" \ > -H "Content-Type: application/json" \ > -H "Accept-Encoding: gzip" \ > --data '{ > "voice_uuid": "55592656", > "data": "Hello from Resemble!", > "sample_rate": 48000, > "output_format": "wav" > }'
Decode the audio_content field from base64 to retrieve the raw audio bytes.
Endpoint
POST https://f.cluster.resemble.ai/synthesize
Request Headers
| Header | Value | Description |
|---|---|---|
Authorization | YOUR_API_KEY | API key from the dashboard. |
Content-Type | application/json | JSON request body. |
Accept-Encoding | gzip, deflate, or br | Optional compression. |
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
voice_uuid | string | Yes | Voice to synthesize. |
data | string | Yes | Text or SSML to synthesize (≤ 2,000 characters). |
project_uuid | string | No | Project to store the clip in. |
title | string | No | Title for the generated clip. |
model | string | No | Model to use for synthesis. Pass chatterbox-turbo to use the Turbo model for lower latency and paralinguistic tag support. If not specified, defaults to Chatterbox or Chatterbox Multilingual based on the voice. Note: Chatterbox-Turbo is supported by all Rapid English voices and Pre Built Library voices. |
precision | string | No | MULAW, PCM_16, PCM_24, PCM_32 (default). Applies to WAV output. |
output_format | string | No | wav (default) or mp3. |
sample_rate | number | No | 8000, 16000, 22050, 32000, 44100, or 48000. Defaults to 48000. |
use_hd | boolean | No | Enables higher-definition synthesis with a small latency trade-off. Defaults to false. |
Response
1 { 2 "audio_content": "<base64>", 3 "audio_timestamps": { 4 "graph_chars": ["H", "e", "l", "l", "o"], 5 "graph_times": [[0.0, 0.12], [0.12, 0.24], ...], 6 "phon_chars": [], 7 "phon_times": [] 8 }, 9 "duration": 1.68, 10 "issues": [], 11 "output_format": "wav", 12 "sample_rate": 48000, 13 "seed": 962692783, 14 "success": true, 15 "synth_duration": 1.64, 16 "title": null 17 }
| Field | Type | Description |
|---|---|---|
audio_content | string | Base64-encoded audio bytes. |
audio_timestamps | object | Timestamp arrays for graphemes and phonemes. Grapheme timestamps (graph_chars, graph_times) are supported for all models, with times in seconds as [start, end] pairs. Phoneme timestamps (phon_chars, phon_times) return empty arrays for newer models and are only populated by legacy models. |
duration | number | Final clip duration in seconds. |
issues | array | Issues related to the request. |
output_format | string | Echoes the requested format. |
sample_rate | number | Echoes the requested sample rate. |
seed | number | Random seed used for generation. |
success | boolean | Whether the synthesis succeeded. |
synth_duration | number | Raw synthesis time prior to post-processing. |
title | string | null | Title saved with the clip, or null if not provided. |
Try it – Repeat the request above and decode
audio_contentlocally:
$ curl --request POST "https://f.cluster.resemble.ai/synthesize" \ > -H "Authorization: YOUR_API_KEY" \ > -H "Content-Type: application/json" \ > --data '{ > "voice_uuid": "55592656", > "data": "Hello from Resemble!", > "sample_rate": 48000, > "output_format": "wav" > }' \ > | jq -r '.audio_content' | base64 --decode > output.wav
