Text-to-Speech | Resemble

Turn text into natural, production-ready speech. Resemble supports multiple synthesis modes tuned for different latency and integration needs.

New and upgraded voices use Resemble Ultra by default. Provide a voice_uuid with your request; the API automatically uses that voice’s model version.

Synthesis Modes

Synchronous

Request-based synthesis that returns a complete audio file in a single response.

Best suited for:

Alerts and notifications
Short-form content
Workflows that require the entire clip before progressing

Streaming over HTTP

Receive audio chunks progressively via chunked HTTP responses.

Best suited for:

Longer scripts
Progressive playback experiences
Reducing perceived latency without persistent sockets

Streaming over WebSocket

Maintain a WebSocket to receive the lowest-latency audio stream with per-chunk metadata.

Best suited for:

Conversational agents
Interactive assistants
Real-time media experiences where milliseconds matter (Business plan and above)

Next Steps

Generate an API token from the dashboard.
Pick the synthesis mode that fits your UX.
Follow the dedicated page for request/response formats and implementation tips.