Streaming (WebSocket)

Maintain a persistent WebSocket to stream audio frames with the lowest possible latency. This API is available to Business plans and above.

WebSocket URL

wss://websocket.cluster.resemble.ai/stream

The server enforces global and per-key concurrency limits. Defaults allow up to 20 simultaneous sessions across the cluster and 20 parallel connections per API key. If you hit capacity errors, back off and retry.

Request Flow

  1. Open a WebSocket connection to the endpoint above.
  2. Send a JSON payload describing the synthesis request.
  3. Consume a stream of audio frames and metadata.
  4. Listen for a terminal audio_end message before closing the socket.

Request Payload

1{
2 "voice_uuid": "<voice_uuid>",
3 "project_uuid": "<project_uuid>",
4 "data": "<text or SSML>",
5 "binary_response": false,
6 "request_id": 0,
7 "output_format": "wav",
8 "sample_rate": 32000,
9 "precision": "PCM_32",
10 "no_audio_header": false
11}
FieldTypeRequiredDescription
voice_uuidstringVoice to synthesize.
project_uuidstringProject to attach the clip to.
datastringText or SSML (≤ 3,000 characters excluding tags).
request_idnumberOptional integer echoed back on responses. Auto-increments per message if omitted.
binary_responsebooleanWhen true, responses are raw audio bytes (WAV or MP3). Defaults to JSON frames with base64 audio.
output_formatstringwav (default) or mp3.
sample_ratenumber8000, 16000, 22050, 32000, or 44100.
precisionstringPCM bit depth (PCM_32, PCM_24, PCM_16, MULAW).
no_audio_headerbooleanWhen true, omits WAV headers from binary responses.

Response Shapes

JSON Frames (binary_response = false)

1{
2 "type": "audio",
3 "audio_content": "<base64>",
4 "audio_timestamps": {
5 "graph_chars": ["H", "e"],
6 "graph_times": [[0.0374, 0.1247], [0.0873, 0.1746]],
7 "phon_chars": ["h", "ˈe"],
8 "phon_times": [[0.0374, 0.1247], [0.0873, 0.1746]]
9 },
10 "sample_rate": 32000,
11 "request_id": 0
12}

Audio chunks arrive sequentially until an audio_end message is emitted.

Binary Frames (binary_response = true)

Frames contain contiguous bytes of the requested format. If no_audio_header is false, the first frame includes a standard WAV header with Resemble’s timestamp metadata.

Termination Message

1{
2 "type": "audio_end",
3 "request_id": 0
4}

Error Handling

Note: The WebSocket API is limited to Business plan customers. Upgrade on the billing page if you receive Unauthorized responses.

Unrecoverable Errors

Connection-level failures close the socket immediately.

1{
2 "type": "error",
3 "success": false,
4 "error_name": "ConnectionFailure",
5 "message": "Failed to establish a connection.",
6 "status_code": 401
7}

Recoverable Errors

The connection remains open, allowing you to fix the issue and retry.

1{
2 "type": "error",
3 "success": false,
4 "error_name": "BadJSON",
5 "error_params": {"explanation": "Provide your query in the 'data' field"},
6 "message": "Invalid JSON",
7 "status_code": 400
8}

Log the error_name and request_id so that you can correlate failures with client requests.