Receiving Audio Data | Resemble

Connect to wss://websocket.cluster.resemble.ai/stream
Send a synthesis request (JSON payload)
Stream audio frames (JSON or binary)
Wait for the terminal audio_end message

Request Payload

1 {
2   "voice_uuid": "<voice_uuid>",
3   "data": "<text or SSML>",
4   "binary_response": false,
5   "request_id": 0,
6   "output_format": "wav",
7   "sample_rate": 32000,
8   "precision": "PCM_32",
9   "no_audio_header": false
10 }

Field	Required	Description
`voice_uuid`	✅	Voice used for synthesis.
`data`	✅	Text or SSML.
`binary_response`	❌	`false` for JSON frames (base64 audio); `true` for raw bytes.
`output_format`	❌	`wav` (default) or `mp3`.
`sample_rate`	❌	`8000`, `16000`, `22050`, `32000`, or `44100`.
`precision`	❌	PCM bit depth (`PCM_32`, `PCM_24`, `PCM_16`, `MULAW`).
`no_audio_header`	❌	Skip the WAV header when streaming PCM.
`request_id`	❌	Optional integer echoed in responses.

JSON Frames

1 {
2   "type": "audio",
3   "audio_content": "<base64>",
4   "audio_timestamps": {
5     "graph_chars": ["H", "e"],
6     "graph_times": [[0.0374, 0.1247], [0.0873, 0.1746]],
7     "phon_chars": ["h", "ˈe"],
8     "phon_times": [[0.0374, 0.1247], [0.0873, 0.1746]]
9   },
10   "sample_rate": 32000,
11   "request_id": 0
12 }

Binary Frames

When binary_response = true, frames contain contiguous audio bytes. Include a WAV header (default) or set no_audio_header = true if you want raw PCM chunks.

Termination Message

1 {
2   "type": "audio_end",
3   "request_id": 0
4 }

Handle the terminal message to cleanly stop playback and reset application state.