For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Status
OverviewProductsManageAPI ReferenceTutorialsClient Libraries
OverviewProductsManageAPI ReferenceTutorialsClient Libraries
  • API Reference
      • POSTSynchronous text-to-speech synthesis
      • POSTStreaming text-to-speech synthesis (HTTP)
Status
LogoLogo
API ReferenceText to Speech

Synchronous text-to-speech synthesis

POST
https://f.cluster.resemble.ai/synthesize
POST
/synthesize
$curl -X POST https://f.cluster.resemble.ai/synthesize \
> -H "Authorization: Bearer <token>" \
> -H "Content-Type: application/json" \
> -d '{
> "voice_uuid": "55592656",
> "data": "Hello from Resemble!"
>}'
1{
2 "success": true,
3 "audio_content": "string",
4 "audio_timestamps": {
5 "graph_chars": [
6 "string"
7 ],
8 "graph_times": [
9 [
10 1.1
11 ]
12 ],
13 "phon_chars": [
14 "string"
15 ],
16 "phon_times": [
17 [
18 1.1
19 ]
20 ]
21 },
22 "duration": 1.1,
23 "synth_duration": 1.1,
24 "output_format": "wav",
25 "sample_rate": 48000,
26 "title": "string",
27 "issues": [
28 "string"
29 ]
30}
Generate speech synchronously from text or SSML. Returns complete audio as base64.
Was this page helpful?

Streaming text-to-speech synthesis (HTTP)

Next
Built with

Authentication

AuthorizationBearer
API token from https://app.resemble.ai/account/api

Request

This endpoint expects an object.
voice_uuidstringRequired
Voice UUID to use for synthesis
datastringRequired

Text or SSML to synthesize (max 3,000 characters)

project_uuidstringOptional
Optional project UUID to store the clip
titlestringOptional
Optional title for the generated clip
modelstringOptional

Model to use for synthesis. Pass chatterbox-turbo to use the Turbo model for lower latency and paralinguistic tag support. If not specified, defaults to Chatterbox or Chatterbox Multilingual based on the voice. Note - Chatterbox-Turbo is supported by all Rapid English voices and Pre Built Library voices.

precisionenumOptionalDefaults to PCM_32
Audio precision for WAV output
Allowed values:
output_formatenumOptionalDefaults to wav
Audio output format
Allowed values:
sample_rateenumOptional
Audio sample rate in Hz
use_hdbooleanOptionalDefaults to false

Enable HD synthesis with small latency trade-off

apply_custom_pronunciationsbooleanOptionalDefaults to false
When true, automatically applies your team's custom pronunciations to matching words in the input text. Defaults to false.

Response

Successful synthesis
successboolean
audio_contentstringformat: "byte"

Base64-encoded audio bytes

audio_timestampsobject
durationdouble
Audio duration in seconds
synth_durationdouble
Raw synthesis time
output_formatstring
sample_rateinteger
titlestring
issueslist of strings

Errors

400
Bad Request Error
401
Unauthorized Error

API token from https://app.resemble.ai/account/api