Receiving Audio Data

  1. Connect to wss://websocket.cluster.resemble.ai/stream
  2. Send a synthesis request (JSON payload)
  3. Stream audio frames (JSON or binary)
  4. Wait for the terminal audio_end message

Request Payload

1{
2 "voice_uuid": "<voice_uuid>",
3 "data": "<text or SSML>",
4 "binary_response": false,
5 "request_id": 0,
6 "output_format": "wav",
7 "sample_rate": 32000,
8 "precision": "PCM_32",
9 "no_audio_header": false
10}
FieldRequiredDescription
voice_uuidVoice used for synthesis.
dataText or SSML.
binary_responsefalse for JSON frames (base64 audio); true for raw bytes.
output_formatwav (default) or mp3.
sample_rate8000, 16000, 22050, 32000, or 44100.
precisionPCM bit depth (PCM_32, PCM_24, PCM_16, MULAW).
no_audio_headerSkip the WAV header when streaming PCM.
request_idOptional integer echoed in responses.

JSON Frames

1{
2 "type": "audio",
3 "audio_content": "<base64>",
4 "audio_timestamps": {
5 "graph_chars": ["H", "e"],
6 "graph_times": [[0.0374, 0.1247], [0.0873, 0.1746]],
7 "phon_chars": ["h", "ˈe"],
8 "phon_times": [[0.0374, 0.1247], [0.0873, 0.1746]]
9 },
10 "sample_rate": 32000,
11 "request_id": 0
12}

Binary Frames

When binary_response = true, frames contain contiguous audio bytes. Include a WAV header (default) or set no_audio_header = true if you want raw PCM chunks.

Termination Message

1{
2 "type": "audio_end",
3 "request_id": 0
4}

Handle the terminal message to cleanly stop playback and reset application state.