Clips

Clips are text-to-speech synthesis jobs within a project. They represent synthesized audio outputs produced by Resemble.

Clip Object

FieldTypeDescription
uuidstringUnique identifier for the clip.
titlestringClip title.
bodystringText that was synthesized.
voice_uuidstringUUID of voice used for synthesis.
voice_namestringName of the voice used.
is_archivedbooleanWhether the clip is archived.
audio_srcstringURL to the synthesized audio file.
character_countintegerNumber of characters in the text.
durationnumberDuration of the audio in seconds.
last_generated_atdatetimeTimestamp when audio was last generated.
timestampsobjectTimestamp data for graphemes and phonemes.
created_atdatetimeTimestamp when clip was created.
updated_atdatetimeTimestamp when clip was last updated.

Timestamps Object

1{
2 "graph_chars": ["H", "e", "l", "l", "o"],
3 "graph_times": [[0.0, 0.12], [0.12, 0.24], ...],
4 "phon_chars": [],
5 "phon_times": []
6}

Note: Phoneme timestamps (phon_chars, phon_times) return empty arrays for newer models and are only populated by legacy models.