Model Versions

All voices you clone with Resemble automatically use Chatterbox, our latest and most advanced text-to-speech model. You don’t need to select a model—every new voice is created with Chatterbox by default.

Resemble Chatterbox

Chatterbox is available in two variants:

VariantVersion CodeDescriptionDataset RequirementsStreaming Support
Chatterboxtts-v4Next-generation TTS model offering improved performance and efficiency.10+ seconds✅ Yes
Chatterbox Multilingualtts-v4Multilingual TTS supporting 23 languages with high-quality performance.10+ seconds✅ Yes

Performance

VariantLatency / TTFS*Character LimitsNotes
Chatterbox250msMaximum 2000 characters.SSML Tags Not Supported: <prosody>, <emotion>, <phonemes>, <substitutions>, <emphasis>, <say-as>. Timestamps not supported.
Chatterbox Multilingual250msMaximum 2000 characters.Same limitations as Chatterbox. Supported languages: es, en, fr, de, ar, pt, ru, tr, it, da, fi, ja, ko, zh, nl, sk, sv, vi, no, pl, sw, hi, he.

Note: Time-to-first-sound (TTFS) reflects best-case numbers. Cold starts, network latency, and load can increase actual latency.

Deprecated Models

The following models are deprecated and no longer available for new voice cloning. They are listed here for reference only.

Version NameVersion CodeDescriptionDataset RequirementsStreaming SupportRelease Date
Resemble Enhanced TTS v3tts-v3Enhanced TTS with excellent latency and fidelity.10+ minutes✅ YesQ4 2023
Resemble Enhanced TTS v2tts-v2Enhanced model with lower latency and high naturalness.30+ minutes✅ YesQ3 2023
Resemble Enhanced TTS v1tts-v1First enhanced model delivering state-of-the-art naturalness.10+ minutes🚫 NoQ2 2023
Resemble Legacy TTStts-legacyFirst-generation TTS balancing speed and quality.1+ minutes✅ YesQ2 2021

Speech-to-Speech Models

Version NameVersion CodeDescriptionDataset RequirementsStreaming SupportRelease Date
Resemble Core STS v2sts-v2Improved pitch tracking and 48 kHz support.10+ minutes✅ YesQ4 2023
Resemble Core STS v1sts-v1Enhanced conversion with better speed and accuracy.10+ minutes✅ YesQ2 2023
Resemble Legacy STSsts-legacyInitial speech-to-speech model for basic voice conversion.10+ minutes✅ YesQ2 2021