Streaming (HTTP)
Use the streaming endpoint to start playback as audio is generated. Responses are chunked WAV data so you can progressively feed a player while long-form synthesis completes.
See the streaming demo project for a full reference implementation.
Careful: Streaming requests target dedicated synthesis hosts (see your streaming endpoint in the dashboard). Do not send them to
app.resemble.ai.
Endpoint
Request Body
Response
The response is a single-channel PCM WAV stream. The first bytes include metadata describing duration and timestamps before audio frames arrive.
Working with the Stream
- Read the first chunk to obtain metadata such as duration, grapheme timestamps, and phoneme timestamps.
- Continue reading chunks and feed them to your playback pipeline.
- Handle the
Content-Encodingheader if you requested compression.
Try it – Issue a streaming request and pipe the response to a file:
WAV Metadata Layout
Resemble annotates WAV headers to expose timing data without additional requests.
- File-level metadata in the
RIFFandfmtchunks - Grapheme and phoneme cue points in
cue,list, andltxtchunks - PCM audio bytes in the
datachunk
Header & Format Chunks
Older models may report the file size as
0xFFFFFFFF. Contact support to upgrade if you see this value.
Cue, List, and LTXT Chunks
cuechunk lists offsets for grapheme and phoneme boundaries.listchunk (typeadtl) groups label data.- Each
ltxtchunk pairs a cue ID with either a grapheme ("grph") or phoneme ("phon") label and duration (in samples).
When reading ltxt chunks, align to 2-byte boundaries. If text_length is odd, skip an additional byte before the next chunk.
Data Chunk
After the header metadata, the stream consists of PCM16 samples that you can decode on the fly.
